1

Closed

doc for ZipEntry.IsText property should be clearer and more complete

description

The ZipEntry.IsText property reflects the metadata stored in the zipfile itself.
 
There are lots of zip tools out there, and some of them "respect" that metadata field, and some of them don't. Unfortunately, even when an application tries to do "the right thing", it's not always clear what "the right thing" is. Regardless of all that, DotNetZip is just telling you what's stored in there.
 
When DotNetZip is used to create a zipfile, it attempts to respect that field. That's why you see IsText being set differently when DNZ is used to create the zip, versus some other tool.
 
I would caution you to be careful about relying on the IsText property. It's sort of wrongheaded, in my opinion. Or maybe it's outlived its usefulness. There's no firm definition of just what it means to be "a text file", not even in the zip spec. Twenty years ago, text was ASCII, each byte was less than 127. So I guess they didn't need a formal definition. IsText meant, all bytes are less than 127. Now, it is not the case that all text files have all bytes less than 127. a UTF-8 file has bytes that are above 0x7f. The zip spec predates all that unicode stuff, and so it has nothing to say on the topic.
 
So I would say, be careful relying on the IsText property. It's not clear that it has a firm meaning, across tools and libraries.
 
These days, a more general way to decide just what kind of file you have, is to use the file type database stored in the operating system. This is the thing that says, a file with .jpg extension is a JPG image file, a file with a .xml extension is an XML document, a file with a .txt is a pure ASCII text document, and so on. Like maybe with ShGetFileInfo - you'd need to do a p/invoke for that. It ends up getting complicated. A better idea might be to just use a heuristic. If there are 4 extensions the files in your zip archive common have, (for example, txt exe doc and ini), then you could heuristically determine the filetype based on comparing it to those known types.
Closed Jul 5, 2011 at 2:31 AM by Cheeso
fixed in changeset 80280. First binary with this change: v1.9.1.6

comments