Opening badly formatted ZIP files

Jul 30, 2009 at 10:14 PM

Hi Cheeso,

I have a whole bunch of ZIP files (exports from an old application using the DynaZIP library) that I must be able to read and import in a new software project using your DotNetZip library. Most of them can be read without any problems but every now and then there is a ZIP file (mostly larger one) that cannot be opened (throwing bad signature exception). Strange thing is that I can open these files with other ZIP tools and even the Windows Explorer can handle them... Armed with the DotNetZip sources, the PKWARE ZIP application note and a Hex editor I found out what's going on:

For some strange reason the DynaZIP library "sometimes" mixes up the data blocks in the zip file and puts the "central directory entries" at the very beginning of the file followed by the local files (headers, data, descriptors) and then (correctly at the end of the file) the "central directory end" block. As all pointers and offsets are valid (the central directory end block states 0 as the position of the central directory and the central directory entries have valid offset pointers to their related files) everything is OK ...besides the bad order of the blocks in the file.

With a few lines of additional code I was successfull in making DotNetZip as tolerant as other ZIP tools regarding these badly formatted files and could correctly extract all data from these archives. In my opinion the changes I made are harmless to the stability and integrity checks of correctly formatted ZIP files. Maybe a question for discussion: "Should these code changes be merged into the official branch or should I continue to maintain my private build?"

Regards,
Jan

PS: I can provide my ZipFile.cs and ZipDirEntry.cs containing the changes and comments upon request. The additions I made are:

  • Adding the "ZipDirEntrySignature" as valid start signature
  • Adding the "ZipEntrySignature" as valid central directory end marker
  • Remembering the "EndOfCentralDirectorySignature" offset after having found it in ReadIntoInstance()
  • Jumping back there right before ReadCentralDirectoryFooter() as this need not necessarily follow directly after the last directory entry anymore
Coordinator
Jul 31, 2009 at 12:17 AM

I would definitely like to have your changes so I can put them in.  You can upload a patch herE: http://dotnetzip.codeplex.com/SourceControl/PatchList.aspx , or, maybe better, you can create a workitem and upload your changes to the workitem.  

Any chance you could give me one of the zip files that have this format?  Or, you could produce another one.  If I had an actual zip file, I could create a test that would run every time I release the code, thus verifying that it continues to work.

 

Coordinator
Jul 31, 2009 at 12:18 AM

also -  it doesn't really seem like the zip files are badly formatted.  Only that DotNetZip makes some invalid assumptions about how zip files are laid out.

Aug 3, 2009 at 9:43 AM

I don't think that DotNetZip did wrong making those assumptions about the ZIP file layout as this is explicitly noted this way in the PKWARE application note. Anyway, let’s call these files (without any judgment) “directory first” files. I will add a new workitem for these type of ZIP files.

I had a bit of a problem with providing a sample ZIP file for the unit tests, because all of the ZIP files I have are customer’s export files containing sensitive data. Therefore I “rearranged” a normal ZIP file made with the Windows Explorer to have the same layout and extra features (e.g. extra fields or data descriptors following the file data) as the problematic export files. I will attach this file to the workitem.

Coordinator
Aug 3, 2009 at 11:11 AM

ok, sounds good.