Save_AfterWriteEntry event

Aug 23, 2010 at 5:46 PM

I'm getting double Save_AfterWriteEntry events when adding multiple files to a ZipFile. In a loop, I'm adding a file and calling Save to trigger the event. I'm catching these events so I can tell when I'm approaching the 4GB file limit, and can close that zipFile and open a new one. The clients that read the output zipfiles can't read zip64 or deal with segmented zip files.

Anyways, what I'm seeing is at least double hits for each file I'm adding, which of course means the counter I'm using to track the current zip file size is growing at least twice as quickly as the real file.

I'm using the zip library 1.9.1.2.

Thanks

 

Aug 23, 2010 at 7:26 PM

I think I see what's happening. Everytime I call Save() in a loop, it sends an event for all the files stored in the zip.

Guess I need to find another way to interrupt the Save() so I can add the files to a new zip archive when I go over the size limit.

Coordinator
Aug 23, 2010 at 11:36 PM
Yes, there's an event generated for each entry written to the zip file.
Aug 24, 2010 at 2:03 PM

Ok, I've switched to using ZipOutputStream. Only problem is that I can't find the size of the compressed data as I'm writing it. I've read the other threads and understand why this is for reading from streams. I'm reading files from the filesystem, and have to use the size of the input files to guestimate where to break off writing to the zipfile before it gets to 4GB. It works, but leads to creating more zipfiles than necessary.

 

If you can think of another way to do this, please let me know. Thanks

 

Coordinator
Aug 25, 2010 at 11:31 PM
Edited Aug 25, 2010 at 11:35 PM

The ZipOutputStream is a Stream.  You can reference the .Position property to see how large the zipfile is, after writing an entry. 

This isn't exactly what you want, because AFTER the last entry is written, when you call .Close() on the ZipOutputStream, it writes a zip directory, which is of variable size.  Usually around 80 bytes per entry that you wrote to the ZipOutputStream, but the actual size varies and is not guaranteed to be less than any particular value.  So, plan to get "a little extra" after you finish writing content for the last PutEntry().  

I think what you are saying is, that you have to make a bit of a guess, because there's no telling how large the next file will be, once compressed.  Suppose you've written a bunch of files to the ZipOutputStream, and .Position is at 3.5gb.  You have another file, 1gb uncompressed.  You can't be sure whether than 1gb file will fit into the remaining .5gb space in the archive.  You need to guess.

 The fact is, you can't know the size of the compressed output until you actually perform the compression.  A way to avoid guessing is to write the zip archive twice.  First, to a bitbucket (Stream.Null).  Keep track of the size of the compressed zip file - the .Position value - after writing the content for each PutEntry().   When the compressed size exceeds your limit, you know you've gone too far.  Then, save the zipfile again, this time to an actual filesystem file, being sure to stop writing entries before the one that would have caused the overflow. 

You'll pay the CPU cost of compression twice.  But, it gets you a more accurate way to stop at 4gb. This approach would use the actual compressed size to make the decision on how to bucket the files.

Good luck!