This project is read-only.

Is Text file property behavior

May 19, 2011 at 11:45 AM

I have a .zip file from client which contains certain Text files.

Now I am validating that they are Text file via property


foreach (ZipEntry ze in objSourceZip)
                            if (!ze.IsText)
                                IsValid = false;

Now Although files in this zip are Text files but i am getting this property false.After unzipping and then again making zip file on my system it certainly shows that isText property true?

Why is this behavior occuring?

May 19, 2011 at 7:52 PM
Edited May 19, 2011 at 7:59 PM

The IsText property reflects the metadata stored in the zipfile itself.

There are lots of zip tools out there, and some of them "respect" that metadata field, and some of them don't.  Unfortunately, even when an application tries to do "the right thing", it's not always clear what "the right thing" is.   Regardless of all that, DotNetZip is just telling you what's stored in there.

When DotNetZip is used to create a zipfile, it attempts to respect that field.  That's why you see IsText being set differently when DNZ is used to create the zip, versus some other tool.

I would caution you to be careful about relying on the IsText property.  It's sort of wrongheaded, in my opinion.  Or maybe it's outlived its usefulness.  There's no firm definition of just what it means to be "a text file", not even in the zip spec. Twenty years ago, text was ASCII, each byte was less than 127.  So I guess they didn't need a formal definition.  IsText meant, all bytes are less than 127.   Now, it is not the case that all text files have all bytes less than 127.  a UTF-8 file has bytes that are above 0x7f.   The zip spec predates all that unicode stuff, and so it has nothing to say on the topic. 

So I would say, be careful relying on the IsText property.  It's not clear that it has a firm meaning, across tools and libraries.

These days, a more general way to decide just what kind of file you have, is to use the file type database stored in the operating system.  This is the thing that says, a file with .jpg extension is a JPG image file, a file with a .xml extension is an XML document, a file with a .txt is a pure ASCII text document, and so on.  Like maybe with ShGetFileInfo - you'd need to do a p/invoke for that.  It ends up getting complicated. A better idea might be to just use a heuristic.  If there are 4 extensions the files in your zip archive common have, (for example, txt exe doc and ini), then you could heuristically determine the filetype based on comparing it to those known types.

Good luck.

Jul 1, 2011 at 7:34 AM

It's Bad.I have relied on this property.Ionic zip either shouldn't have this property or should have clearly communicated about it as it is unreliable.

Jul 5, 2011 at 2:47 AM

The documentation indicates that it may be irrelevant:

I agree that the doc could more clearly state that the bit is as set by the writing application, and may not have any particular real meaning.

Jul 5, 2011 at 2:48 AM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.