Zips not being compressed

Oct 12, 2008 at 3:08 AM
Edited Oct 12, 2008 at 3:11 AM
My code looks like this:$0$0$0ZipFile zip = new ZipFile("U:\\test.zip");$0zip.ForceNoCompression = false;$0zip.AddFile("C:\\ReallyBigFile.iso");$0zip.Save();$0$0$0But the zip file is exactly the same size as the original file. How can I compress the file? (I've tried three other "Compression" libraries but none work - some even make the result larger than the original files. I'm running out of nerve here. Heh)$0$0Thanks,$0-Matt
Coordinator
Oct 13, 2008 at 8:55 PM
Edited Oct 13, 2008 at 9:03 PM
Matt-
The library is smart about using compression.
DotNetZip will revert to "store" as opposed to "Deflate" if the deflate results in an inflation of the data size.  
This happens with some data formats.

If DotNetZip is not compressing your content, then that's what happened.

Related:  In the v1.6 preview release, there is a property on the ZipFile called "WillReadTwiceOnInflation" which is a callback.  The library will call your code and ask for approval to read the stream again, if the first read-and-compress effort produced a larger file.  If you want to force the deflate to be applied, even when it actually inflates the data size, you can set this callback to always return false.   I may also add a convenience variable called "ForceCompression" which does the same thing.  The problem is that people may erroneously conclude that "ForceNoCompression=false" should do this, as you did. 

I have to think about this further.  I think it is a documentation issue:  I need to be clear that ForceNoCompression=false does not mean "Force Compression."   It means "Don't force NO compression." 
Coordinator
Oct 13, 2008 at 9:10 PM
At this point I don't like the idea of adding a ForceCompression flag, because I don't think it's the right thing to do.  I think adding it will (a) make the library interface more complicated, and (b) encourage bad behavior.   It is already possible for the app to force the library to run the deflate algorithm, with the callback.  But I think it is generally the wrong thing, so i don't want to make it any easier. 
Nov 13, 2008 at 2:16 PM
Hello,

thanks for the library. As to this problem: I happened trying to add an SQL file that accidentally had .ZIP extension (otherwise, it was just an SQL script - a plain text file which could be compressed more than 10 times using the built-in ZIP in Total Commander). The file was not compressed at all. Looks like the library (version 1.6.3.10) just checked the extension. It might be a good idea to really try the compression algorithm. First, it took me several hours wondering why it's not being compressed instead of trying to solve the extension problem. Second, in the old DOS days, people used to have ZIPs in ZIPs and it sometimes shrunk the file - I guess if there was a lot of small files, then some space could be saved by compressing the file names in the inner ZIP.

Thanks

Bolek
Coordinator
Nov 13, 2008 at 3:39 PM
I just looked in the doc and found that I had failed to complete the documentation on this behavior. 

I just modified it to say this: 

In some cases, applying the Deflate compression algorithm to an entry can result an increase in the size of the data.  This "inflation" can happen with previously compressed files, such as a zip, jpg, png, mp3, and so on.  In a few tests, inflation on zip files can be as large as 60%!  Inflation can also happen with very small files.

To handle these cases, the DotNetZip library takes this approach: first it applies a heuristic, to determine whether it should try to compress a file or not.  The library checks the extension of the entry, and if it is one of a known list of uncompressible file types (mp3, zip, docx, and others), the library will not attempt to compress the entry.  The library does not actually check the content of the entry.  If you name a text file "Text.zip", and then attempt to add it to a zip archive, this library will, by default, not attempt to compress the entry.

For filetypes not covered by that heuristic, the library attempts to compress the entry, and then checks the size of the result.  If applying the Deflate algorithm increases the size of the data, then the library discards the compressed bytes, and stores the uncompressed file data into the zip archive, in compliance with the zip spec.  This is an optimization where smaller size is preferred over longer run times.

Next, the library exposes this WantCompression callback, on the ZipEntry (and in v1.7, on the ZipFile class).  With this callback, the application can supply its own logic for determining whether to apply the Deflate algorithm or not.  For example, an application may desire that files over 40mb in size are never compressed, or always compressed.  An application may desire that the first 7 entries added to an archive are compressed, and the remaining ones are not.  The WantCompression callback allows the application full control, on an entry-by-entry basis.

Finally, the application can specify that compression is not even tried, by setting the ForceNoCompression flag.  In this case, the compress-and-check-sizes process as decribed above, is not done, nor is the callback invoked.

 

Coordinator
Nov 13, 2008 at 3:40 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.