Unicode support?

Sep 7, 2010 at 2:17 AM
Edited Sep 7, 2010 at 6:39 AM

Reading through the FAQ it seems that DotNetZip library supports Unicode filenames (entry names) and the UseUnicodeAsNecessary property on the ZipEntry class suggests the same. However when I look at the BitField property it says this (bit 11):

Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file must be encoded using UTF-8. This library currently does not support UTF-8. 

Testing utf-8 support with zipit.exe and using the -utf8 parameter gives some strange results. It seems to work for German Umlaute but a Japanese filename results in a lot of underscores.

Extracting with Windows or 7-zip does not result in the correct filenames.

Is Unicode (UTF-8) supported or is this a shortcoming of the library?

Apart from that issue the library looks very solid and much easier to use than #ZipLib - well done guys!

Coordinator
Sep 9, 2010 at 10:30 AM

Unicode is supported by the library but may not be supported by your console. 

Try coding your test in C# or VB, supplying the appropriate Unicode characters, and you'll see that DotNetZip can and does encode unicode filenames properly. 

I cannot guarantee that if you start with a valid zip file, that contains unicode filenames and was created with DotNetZip, and try to unzip with either Windows or 7-zip, it will result in proper filenames.  That would depend on the correct support of unicode for zip in those two components, and I can't guarantee that they provide it.  As I described in the documentation for DotNetZip (UseUnicodeAsNecessary), some libraries and tools clearly violate the PKWARE specification with respect to encoding of filenames.  

Good luck.

 

Sep 9, 2010 at 11:59 PM
Edited Sep 10, 2010 at 12:00 AM

Thanks Cheeso,

After some more investigation the support for Unicode filenames within zip files seems to be very inconsistent across the different libraries (info-zip, rubyzip, SharpZipLib). We decided it is better not to use unicode filenames within zip files in our application.

thanks for your reply.

Coordinator
Sep 19, 2010 at 6:22 PM

Yes, I agree, the support for unicode in the various ZIP libraries is very spotty and inconsistent. I designed DotNetZip to produce ZIP files that conform to the ZIP specification, and the feedback I've gotten about the Unicode capabilities in DotNetZip has been universally positive - it's easy to use, and it produces compliant files. 

On the other hand I still think the value of using Unicode in ZIP files is somewhat limited due to the lack of consistent support for it in the other various tools and libraries.  Not everyone uses DotNetZip, unfortunately.