Using Unicode for file names that have ÅÆØ symbols.

Jun 21, 2010 at 12:29 PM
Edited Jun 21, 2010 at 12:39 PM

In my project I want to use DotNetZip for creating zip archives. I need file names in result archives to be encoded with Unicode.

Code that I use is:
public static void CreateZipFile(String outputPath, IEnumerable<String> filePaths)
{
   using (ZipFile zip = new ZipFile(outputPath, Encoding.UTF8))
   {
       zip.UseUnicodeAsNecessary = true;
       foreach (String filePath in filePaths)
       {
             ZipEntry entry = zip.AddFile(filePath, String.Empty);
       }
       zip.Save(outputPath);
   }
}

This code uses UTF8 for cases when there is some characters in file name that are missing in IBM437 encoding.

I have scenario when this behavior is not acceptable for me:
If name of the file is 'åøæ.dat' IBM437 is used as encoding - because all characters are present in that code page. If file is opened on machine with some, for example Ukrainian, locales æøå characters get corrupted. If file name contains these characters + some characters missing in IBM437, UTF8 is used and this is desired output for me.

Is there some way to tell DotNetZip to use UTF8 no matter if all characters in file name are in IBM437 encoding?
Coordinator
Jun 28, 2010 at 8:19 PM

Maybe there is a way - try the ProvisionalAlternateEncoding property.  Set it to UTF8. 

I don't think DotNetZip will use UTF8 encoding if it is not necessary.  But the problem you describe is not a result of how the zipfile is encoded.  I think the problem is in how the zipfile is being unpacked.  There is a clear indication of the encoding that has been used in the zip file. If the unpacker does not observe this encoding, then you will get problems. 

what are you using to unpack the zip file?  It's my understanding that Windows itself will NOT respect unicode encoding in a zip, either UTF8, or otherwise.  I may be wrong about that. It could be that Windows will use the default code page when reading compressed archives. In that case you would need to pack the zip using the Ukrainian code page (whatever THAT is), in order to allow it to be successfully unpacked on a PC in the Ukraine.

 

 

 

 

 

 

Jul 1, 2010 at 11:07 AM
Edited Jul 1, 2010 at 11:08 AM

Hi Cheeso,

Thank you for your reply.

Yes, I know that Windows archiver doesn't support UTF8, but most of other archivers (7zip, WinRar, etc) do.

I need to be able open the archive on a computer with any locale. That's why UTF8 was chosen.

ProvisionalAlternateEncoding doesn't work in this case.

I have found code in the ZipEnty._GetEncodedFileNameBytes

 

 

// workitem 6513: when writing, use the alternative encoding only when ibm437 will not do.
byte[] result = ibm437.GetBytes(s1);
// need to use this form of GetString() for .NET CF
string s2 = ibm437.GetString(result, 0, result.Length);
_CommentBytes = null;
if (s2 == s1)
{

//code omitted

}

else

{

//code omitted

}

 

 

After removing this check I got the result I need. But I would like to have nicer and easier way to accomplish this.

 

 

 

Jul 15, 2010 at 10:12 AM

Please fix a bug with the Unicode: file "Zip Partial DLL\ZipDirEntry.cs", function "internal static ZipEntry ReadDirEntry(ZipFile zf)". After lines

 

if ((zde._BitField & 0x0800) == 0x0800)
{
// UTF-8 is in use
zde._FileNameInArchive = Ionic.Zip.SharedUtilities.Utf8StringFromBuffer(block);

it is necessary to add

 

zde.ProvisionalAlternateEncoding = System.Text.Encoding.UTF8;

 

so that the filenames saved as UTF-8 will be still marked as UTF-8 when an archive is updated.