Does ZipFile keep the entire file in memory?

Aug 23, 2011 at 9:14 PM

I am working with about a 70mB .mde file which I am compressing as I save it to a remote network location for storage and later retrieval.

Up to now I have been using the standard .Net GZipStream to deflate the file for storing. I have been examining DotNetZip as an alternative so that I can store a zip file instead of a gz file.

It appears that using the Ionic.Zip.ZipFile class is the simplist way to go about that, however I have two questions about it in order to deturmine if I should use the stream classes instead:

  1. If I add a file to the ZipFile does it store the file to memory or wait until Save() is called? What if the input is from a stream?
  2. If so, is the file deflated when it is added or when Save() is called?Jacob

Jacob

Coordinator
Aug 24, 2011 at 2:56 PM

 

  1. If I add a file to the ZipFile does it store the file to memory or wait until Save() is called? What if the input is from a stream?

No.  if you call ZipFile.AddFile(), DotNetZip keeps a record of the filename you specify.  At the time you call ZipFile.Save(), dotnetzip tries to open the filename, and then using streams, reads the file in chunks, compresses, and writes to the output file.  There are properties on the ZipFile class to set the size of the buffers used for this streaming operation.  I cannot recall the default, I think it might be 16k or so.

  1. If so, is the file deflated when it is added or when Save() is called?Jacob

this question is moot.

--------

Also:  if you are converting from a GZipStream, you may find it easier to adopt the ZipOutputStream class. It's an alternative to the ZipFile class. I believe ZipFile is easier to use in most cases, but if you already have code that uses GZipStream, you can *almost* just replace GZipStream with ZipOutputStream, and change nothing else in your code.  You will need one additional call to the ZipOutputStream:  PutNextEntry() to define the name of the entry in the zip archive.

ZipFile is the easier metaphor, I think, if you are starting from nothing. 

In either case, streaming is used.

 

Aug 24, 2011 at 3:40 PM

Thank you Cheeso, that is exactly what I was hoping for.

If I may, I have a couple follow-up questions:

  1. If I add a file and then change its name (in the entry) before calling Save() will that break its association with the file?
  2. On the flip side, when reading a file does the zip file load when you call ZipFile.Read() or not untill you call Extract()?

----------

I am aware of ZipOutputStream and my questions were mainly to help me decide whether to use it.

My assumption is that the pump/buffer in ZipFile is much more efficient than any that I would write so I was hoping to use that instead of mine. My concern was that it would be buffering the entire file, but your reply has more than satisfied my concern.

Coordinator
Aug 24, 2011 at 4:14 PM
  • If I add a file and then change its name (in the entry) before calling Save() will that break its association with the file?

    No.  You can do this:

    ZipEntry e = zipfile.AddFile("MyResignationLetter.doc", "/");
    e.FileName = "ILoveYou.doc";

    ... and what happens is, MyResignationLetter.doc is the name used to read from the filesystem, but "ILoveYou.doc" is the name used within the archive for that entry.

    • On the flip side, when reading a file does the zip file load when you call ZipFile.Read() or not untill you call Extract()?

    In the zipfile, there is a "directory" which usually contains a directory entry of between 50 and 80 bytes for each entry in the zip archive.  ZipFile.Read() reads the directory. Suppose you have a zipfile that is 4gb in length, and it has 3 entries in it.  ZipFile.Read() seeks to the directory in the archive, and reads the 200 or so bytes for the directory.  If you then call ZipEntry.Extract on any of the entries from that zipfile, Extract() seeks in the zip archive file to the place where the compressed bytes are for that entry, reads and decompresses, and writes the decompressed bytes out to a filesystem file. It does this in chunks, so even if the original size of the file was 2gb, then Dotnetzip holds in memory only about 32k of the uncompressed data at any one time as it extracts.

    Something similarly intelligent happens in the ZipOutputStream.