Advice on how to zip large number of files with minimal memory usuage.

Feb 2, 2011 at 3:07 PM

Hi Cheeso,

I have a requirement to load a potentially large number of images from database and put them in a zip file and show it using the Response.OutputStream. 

My concern is the memory.  My code loops to retrieve the images one at a time and puts it in a byte array and then uses Zip.AddEntry.  Each time the byte array is being re-initialized and it seems working fine.  However, what is unclear to me is where are these images that are being read from the database are kept until the call to Zip.save takes place?   I am afraid when the number of images retrieved gets potentially bigger, then the memory runs out?  Please advice.

This is my code:

protected void GetZippedData()
        {
            if (this.SelectedKeys.Count > 0)
            {
                Response.Clear();
                Response.ClearHeaders();
                Response.BufferOutput = false;

                string archiveName = String.Format("Batch-{0}.zip", DateTime.Now.ToString("yyyy-MMM-dd-HHmmss"));
                Response.ContentType = "application/zip";
                Response.AddHeader("content-disposition", "inline; filename=\"" + archiveName + "\"");

                using (ZipFile zip = new ZipFile())
                {
                    foreach (DataKey K in SelectedKeys)      {
                           byte[] Image = RetrieveImageFromDatabase(K.Value.ToString());
                          if (Image != null)
                          {
                              zip.AddEntry(string.Format("Image_{0}.tiff", K.Value.ToString()), Image);
                          }
                    }
                    zip.Save(Response.OutputStream);
                }
                Response.End();
            }
 }

Thanks.

Coordinator
Feb 2, 2011 at 3:26 PM
Flamingo wrote:

Hi Cheeso,

I have a requirement to load a potentially large number of images from database and put them in a zip file and show it using the Response.OutputStream. 

My concern is the memory.  My code loops to retrieve the images one at a time and puts it in a byte array and then uses Zip.AddEntry.  Each time the byte array is being re-initialized and it seems working fine.  However, what is unclear to me is where are these images that are being read from the database are kept until the call to Zip.save takes place?   I am afraid when the number of images retrieved gets potentially bigger, then the memory runs out?  Please advice.

This is my code:

protected void GetZippedData()
        {
            if (this.SelectedKeys.Count > 0)
            {
                Response.Clear();
                Response.ClearHeaders();
                Response.BufferOutput = false;

                string archiveName = String.Format("Batch-{0}.zip", DateTime.Now.ToString("yyyy-MMM-dd-HHmmss"));
                Response.ContentType = "application/zip";
                Response.AddHeader("content-disposition", "inline; filename=\"" + archiveName + "\"");

                using (ZipFile zip = new ZipFile())
                {
                    foreach (DataKey K in SelectedKeys)      {
                           byte[] Image = RetrieveImageFromDatabase(K.Value.ToString());
                          if (Image != null)
                          {
                              zip.AddEntry(string.Format("Image_{0}.tiff", K.Value.ToString()), Image);
                          }
                    }
                    zip.Save(Response.OutputStream);
                }
                Response.End();
            }
 }

Thanks.

  Flamingo, thanks for opening a new thread.

You're right - if you are adding entries into a zip file with content obtained from a database, calling Zip.AddEntry(), then all of those byte arrays need to be present and active at the time you call ZipFile.Save().  So I would recommend against the model you have now.  It will work, but it will be inefficient in memory usage, just as you suspected. When you pass a byte array to the call to ZipFile.AddEntry(), the ZipFile class maintains a reference to the byte array, and does not  release it until after the call to ZipFile.Save().  With a large number of entries, your memory usage will skyrocket.

What to do? 

I'd recommend that you use a different overload of ZipFile.AddEntry().  There are two in particular that will improve things for you.  One of the overloads accepts an opener and a closer.  These are delegates, methods in your code that get called by DotNetZip to open and close the input stream for the ZipEntry in question.  Think of it this way: when ZipFile.Save() is called, DotNetZip reads from an input stream, then compresses the data, and writes the entry to the output stream, the zipfile or in your case Response.OutputStream.  The overload for ZipFile.AddEntry()  that accepts an opener and closer delegate allows DotNetZip to tell your app to open and close the stream just-in-time. These calls into your opener and closer happen within the context of ZipFile.Save().  So in your case you'd call RetrieveImageFromDatabase() within the opener, read the data into a MemoryStream (just a stream wrapper on a byte array), and then return the MemoryStream to DotNetZip.  Your opener gets the name of the ZipEntry, which is a string, so you'd need to extract the K.Value from that string, in order to call RetrieveImageFromDatabase().  Also, be sure to set the Position on the MemoryStream to 0 (zero) before returning it.  DNZ will read from the stream your opener returns, do the zip thing, and then call your closer delegate.  For a MemoryStream you don't need to do anything special in the closer.  But if your opener had returned a FileStream, or some other stream, this would be the place where you'd Dispose() it. Then DotNetZip moves to the next entry, and does the same thing, calling your opener and closer in turn. The result is you never keep more than one byte array resident in memory at a time, even if you have 10's of thousands of entries.

This overload is documented at http://cheeso.members.winisp.net/DotNetZipHelp/html/88e02061-3787-c10b-8522-a9e045f0bd94.htm 

The other thing you need to concern yourself with is the bit-3 encoding.  Consult this recent thread for some background on it: http://dotnetzip.codeplex.com/Thread/View.aspx?ThreadId=242980

Good luck.

 

 

Feb 2, 2011 at 3:31 PM

Your reply is most appreciated.  I will try it.  Thanks.

Mar 7, 2011 at 2:20 PM

Hi Cheeso,

When I pass the K.Value as the entry name, I get a compressed file but without any associated type.  However, when I add ".tiff" to the end of the entry name, I get an error: " Windows cannot open thid folder.  The Compressed (zipped) Folder ... is invalid."

 

zip.AddEntry(String.Format("{0}",K.Value.ToString()), opener, (name, stream) => stream.Close());

vs.

zip.AddEntry(String.Format("{0}.Tiff",K.Value.ToString()), opener, (name, stream) => stream.Close());

Please advise.

 

Thanks.

Mar 7, 2011 at 2:48 PM

Cheeso,

Please ignore my previous entry.  I got it to work. 

Thanks.