multiple zip files with a preset size limit

Mar 19, 2009 at 4:43 PM
Edited Mar 19, 2009 at 4:45 PM
Hi, I have a unique situation. While creating zip files I need to make sure the size doesn't exceed a preset value lets say 20mb. So as soon as the size is exceeded I need to remove the last file and create a new zip file. I know I can remove a file from the zip file but how do I check the size of the zip file while adding files to zip file. I tried zipentry object's CompressedSize property but It's returning 0, I believe this property can only be used while extracting files or manipulating an existing zip file. It would be a shame if I have to save the zip file every time I add a file then check the size, becasue I will be working with tens of thousands of files and file IO might make the process inefficient.

Thanks in advance.
~a
Coordinator
Mar 22, 2009 at 5:39 AM
The CompressedSize is meaningful only AFTER you have written the ZipEntry once, using a Save() method on the ZipFile class.
This is documented, expected behavior.
There's no way to know how small a file will be in the zip archive until you compress it.

I understand your scenario and it seems reasonable.  Unfortunately, there is no intelligence in the ZipFile class to limit zipfile sizes to any particular size.

One workaround may be to save the zip archive to the "bitbucket" - System.IO.Stream.Null . All of the ZipEntry instances will be compressed and optionally encrypted, and at that point the CompressedSize properties will be valid. This will do no writing to your disk, though there will be IO (reading the files to be added to the zip) and significant compute resource consumed.

I understand that it seems "a shame" that we cannot know the CompressedSize before actually compressing the file, but it is unavoidable, as I'm sure you can understand.
Mar 22, 2009 at 2:28 PM
Sorry Cheeso, it was not my intention to offend anybody. I used the word to describe more of the programming style to use the library, but not the library itself. It’s a great library and very easy to use.

I like your idea, but I ended up using another approach. All of my files are PDF files. I am adding each file individually instead of adding the whole folder. While adding each file I look up their size and add it to a "Total Size" variable after multiplying it with ".9" (assuming PDF file compressed would be 90% of its original size) before adding them to the zip file. This way I avoid expensive IO and get an approximate size of the zip file.  As soon as I reach 4GB I save the zip file and create a new file.

Keep up the good work.
Thanks.
~a

Coordinator
Mar 22, 2009 at 6:07 PM
No offense taken!  I was not at all put out by your question, only trying to reply with good information.
I agree that not knowing the size of the compressed archive until the very end is "a shame".  

Your approach seems a good one, if the .9 ratio is reliable....  But it bothers me that there is no feedback loop - nothing to ensure the estimate you compute is tracking the actual value.
 
Tools like WinZip avoid this by simply saving the zip every time a file is added.  I could add that to DotNetZip but I don't think it is very practical.  For small files, let's say, under 10mb, this can be quick, maybe 0.2 seconds.  But as files become larger it can take 10 or 20 seconds to save a single entry.  So, saving automatically wouldn't work.   The app always has the option of doing that itself, though.