zip file lot larger than content

Feb 21, 2009 at 12:14 PM
Edited Feb 21, 2009 at 12:25 PM
Im adding a large psd file sized 70 486 013 to an archive.
Opening the resulting zip with winrar shows the packed size of the file is 53 896 916. However the zip file itself is 161 691 180.
After running Repair Archive in winrar the file size is reduced to 53 897 060. 
What am I missing?

'Create Archive at ArchiveFullPath 
    Archive = New ZipFile(ArchiveFullPath) 
    Archive.TempFileFolder = "C:\Temp"

'Add files from Directory pointed by DirectoryInfo Di
    Dim Files() as FileInfo 
    Files = Di.GetFiles() 

    For i As Integer = 0 To Files.Length - 1 
        Archive.AddFile(Files(i).FullName, "Assets") 
    Next
 
    Archive.Save()
Coordinator
Feb 21, 2009 at 6:08 PM
Edited Feb 21, 2009 at 6:30 PM
Hmmm, interesting puzzle.
Some questions:
First, What version of the DotNetZip library are you using?  In v1.6 and prior releases, the size of the "compressed data" could actually be larger than the data itself.  This was the result of well-known anomalous compression behavior in System.IO.Compression.DeflateStream. (it's been raised as a bug with Microsoft but there's no fix yet).   In v1.7, DotNetZip includes a different managed DeflateStream class, based on zlib, which never inflates the data.  If you are using v1.6 I advise that you migrate to v1.7 ASAP, and retry. 

If you are using v1.7, can you give me a list of the entries in the zip file, the output of  DotNetZip's "unzip -l" tool?
Tell you what:  give me the output of "unzip -l" both BEFORE and AFTER you run the archive through winrar.

I would also like to know the other options you used on the zip file:
 the CompressionLevel setting
 the encryption if any
 
What's a PSD file?
Feb 22, 2009 at 2:03 PM
Edited Feb 22, 2009 at 2:08 PM

A psd file is a PhotoShop work image including all the layers etc.

I'm using ver 1.7 with default settings (maybe this is the problem, do I need to set a compression level) and here is the output from unzip utility. For your information I get the same results with large tiff files. I'm working with apps for print shops and usally handeling very highres images. This is the outut from the requsted tests. For simlicity the directory path were removed from the result:

>unzip -l Test.zip
Zipfile: Test.zip

Modified Size Ratio Packed pw? CRC Filename
--------------------------------------------------------------------------------
2008-09-26 13:59:00 70486013 24% 53896916 N 0840087C Assets/Dalia & Isak.psd

>unzip -l rebuilt.Test.zip
Zipfile: rebuilt.Test.zip

Modified Size Ratio Packed pw? CRC Filename
--------------------------------------------------------------------------------
2008-09-26 13:59:00 70486013 24% 53896916 N 0840087C Assets/Dalia & Isak.psd

>Dir *.zip
Volymen i enhet C har ingen etikett.
Volymens serienummer är CC5A-1E96

Innehåll i katalogen 
2009-02-21 13:02 53 897 060 rebuilt.Test.zip
2009-02-21 12:56 161 691 180 Test.zip 
Coordinator
Feb 22, 2009 at 5:27 PM
Edited Feb 22, 2009 at 5:48 PM
hmmm ok thanks. I will have a look.
You don't need to set a compression level different than the default. Normally the default is very effective.
Coordinator
Feb 22, 2009 at 5:49 PM
what version, specifically, of v1.7 ? 
Can you print out the ZipFile.LibraryVersion property?
Coordinator
Feb 22, 2009 at 6:32 PM
I tried reproducing this behavior here using PSD files and other large binary files, running it through the code you supplied.  I could not reproduce the problem. I am using the latest release of DotNetZip.

It's odd to me that unzip -l would report a small, apparently correct "compressed size" value, but yet the actual zip file is so large.  The only way this could happen is if there is a large amount of rubbish data in that zip archive.
It's possible there was a double-write problem in earlier v1.7 versions of DotNetZip, where zip content would be written twice to the file, and then orphaned. In theory, this could produce the size discrepancy you are seeing.  I remember there was a problem with the re-compress logic in pre-final versions of v1.7 - it was re-reading files and re-compressing unconditionally - but I don't remember a size discrepancy as a symptom of that problem.  Just because i don't remember it doesn't mean it didn't happen, though.  This problem was fixed before the final release of v1.7 (which was v1.7.2.4).

Keep in mind that "v1.7" is not enough to completely describe the assembly you are using.  There were multiple step improvements along the way to a final release of v1.7 (and even some minor improvements after).  I'd like to verify the exact version of the library (you'll get that with ZipFile.LibraryVersion) and I'd like you to try using the latest v1.7 release, which is v1.7.2.7.  

Feb 22, 2009 at 8:22 PM
I've checked the version wich is 1.7.2.7.
If it can be of help I've made a test zip file avaliable at http://www.vilundate.se/dotnetzip/Test.zip 
Thank's for your concern
Coordinator
Feb 22, 2009 at 8:47 PM
Ok I am copying that file now.
Any chance you could give me one of the psd files that leads to the problem?
Coordinator
Feb 22, 2009 at 10:49 PM
Edited Feb 22, 2009 at 11:28 PM

Hmmm, very curious.  I got the zip file.  Obviously it is much larger than it needs to be. I unpacked it and got the 70mb PSD file.  I then zipped that up with the VB code that you provided. And the result was a zip file of 161mb.  So I immediately reproduced the problem you reported. I think.

After seeing the unexpectedly large zip file, I tried it again, more than 10 times, using the same code and in each case, I got a zip file that was 53mb. 

At this point I am questioning whether I really saw the first result.

I downloaded the file again, and tried again, and still, I was not able to reproduce the anomalous behavior. Is it reproducible on your machine?

I saw what you reported, I think, just once, but now I cannot get it to happen again. 

Coordinator
Feb 23, 2009 at 12:29 AM
Very odd.  The zipfile you sent to me contains 3 copies of the zip, concatenated to each other.  It is as if you created the zipfile, then did a
  copy /b  zipfile.zip+zipfile.zip+zipfile.zip  Test.zip

I don't understand how this could happen, using just the DotNetZip API.  So far, I cannot get it to happen here.
Is it repeatable at your place?