Immediate output of streamed zip to Response.Output with buffered input

Jan 28, 2009 at 9:12 AM
Can the following be done with DotNetZip?
Usage: Browser client receives a zip file [of known size] [possibly a self extractor zip file] from ASP.Net and starts downloading. 
             Server meanwhile is filling the zipOutStream with data.
             When done, user can save zipfile (or self extractor file).
Functionality:
a. Server adds "filenames" and file sizes to be added (including compression methods)
b. Server streams a zipfile header to Response.Output
c. In a buffer filling loop:
   i. Server adds buffered input to input streams creating the "files".
   ii. Server streams the current information to Respone.Output
d. When done filling info:
  i. Server streams the zip footer information
  ii. Client when done receiving sees the "Save / Open" dialog.

If so, could you point to or give a code snippet.
If not, would you wish for me to contribute? - And then your thoughts on design for this?
Coordinator
Feb 2, 2009 at 6:19 PM
There's an example that does what I think you want,
here: 
http://code.msdn.microsoft.com/DotNetZip
Jun 29, 2009 at 12:49 PM

Hi Cheeso, and thanks for your answer, but the example is no good.

 

        using (ZipFile zip = new ZipFile())
        {
            foreach (var f in filesToInclude)
            {
                zip.AddFile(f, "files");
            }
            zip.AddFileFromString("Readme.txt", "", ReadmeText);
            zip.Save(Response.OutputStream);
        }
        Response.End();

Whereas I need something like the following  (so that while I'm saving the zip, the user is already receiving it):
I can show you that this is what many of the posts where you advertised DotNetZip with the former code,   are actually requesting.
It can be done with ZipSharpLib but not with DotNetZip.
    using (ZipFile zip = new ZipFile()) 
    { 
        foreach (var f in filesToInclude)
        { 
            zip.AddFile(f, "files");
            zip.WriteToOutput(Response.OutputStream); 
            Response.Flush(); 
        } 
        zip.AddFileFromString("Readme.txt", "", ReadmeText);
        zip.WriteToOutput(Response.OutputStream); 
        Response.Flush(); 
    } 
    Response.End(); 
    // In any case, thanks for the great tool!
     

 

Jun 29, 2009 at 12:54 PM

Keywords for this question:
  realtime zip,  real time zip,
  zip created on the fly,
  immediate zip download,
  streamed zip to output

Coordinator
Jun 29, 2009 at 4:24 PM
Edited Jul 2, 2009 at 6:35 PM

Yes, in fact you can do what you want, using what I have called the just-in-time stream provisioning mechanism in v1.8 of DotNetZip. 

Although, I would suggest that you be careful about insisting that the browser "must receive the file as the server sends it".  [EDIT: Of course you want the browser to receive the file as soon as the server sends it - this is what streaming is.   And as I've told you, and as I have explained in other places that you seem to have read, DotNetZip uses a streaming approach.   By saying the browser "must receive the file as soon as the server sends it,"  I think you are really implying "the browser must receive the file as soon as my app calls AddFile()."   I think you are concluding, incorrectly and without justification, that the compression is happening at the time of the call to AddFile. ]   It seems like an implementation detail that typically, you would not need to concern yourself with.  Is it performance that concerns you?  Which aspect?  Time-to-first-byte?  Time-to-last-byte?  Something else?   If performance is your concern, I suggest that you benchmark the easy approach with DotNetZip, before dismissing it as not appropriate or "no good".   It actually *does* stream the zip file to the browser. 

On the other hand I can see situations involving dynamically-created input streams, where the JIT stream provisioning would be ideal.  There may also be some benefit when the zip file is huge, with tens of thousands of entries.  But these are not mainstream scenarios.

You offered to contribute something, and I appreciate the offer.  But before we go forward, I need to better understand the problem we're trying to solve here.

 

Jul 2, 2009 at 4:10 PM

I need zip exactly for this reason, I'm putting together a group of very large files,  some of them media files, others part of an application (without compression).

Call this performance, or call it common sense:
a. Why should the user wait the whole time it takes to create this zipfile in memory, and only then start downloading it?
b. Why should the server's memory have a full copy of this giant file aggragation, when it could be simply streaming it off, a chunk at a time?

How do I use the JIT stream provisioning mechanism?  Is it documented?

My offer to contribute is thus: (I didn't see you source code yet):
Add four methods: WriteEntry(OutputStream out, ZipEntry entry)
                             WriteHeader/Footer(OutputStream out, List<string> files);
                      
and CalculateZipfileSize(OutputStream out, List<string> files);) 
These methods replace the "Save", which do the following:
a. Throw exception if the method of compression is not STORE (ie UNCOMPRESSED) for all files.
b. Calculates the size of the zipfile, according to the files which will be stored in it
c. Writes an entry to an output (or stream).
d. Writes the zip header or zip footer 

Moshe

Jul 2, 2009 at 4:14 PM

Or maybe instead of WriteEntry  have Write(OutputStream out, Array buffer)
and I would have low level control on what is currently in the buffer...
so I could send chunks of information say 100K at a time.

Coordinator
Jul 2, 2009 at 6:26 PM
Edited Jul 2, 2009 at 6:41 PM

Thanks for your suggestions. I think you have a basic misconception about how the DotNetZip library works.

When you call zip.AddFile(foo...) , the call stores metadata about the entry you have added.   It does not produce a zip file in memory at that time.  It stores information about the entry that will eventually be part of the zipfile. 

What information?  Things like: the filename, the timestamp on the file, and whether encryption and compression should be used when the file is eventally put into the Zip archive.  No compression or encryption is performed at the time of the call to AddFile() or AddEntry() and their cousins. These operations (AddFile etc) are generally very fast, because they only collect and store metadata.

It is when you call Save() that the zipfile is then constructed, using the metadata stored in memory.  The compression and encryption is performed at that time.  If you call Save() and pass a stream, DotNetZip writes the zip content directly to the output stream.  Because it uses a stream approach, the application never has "a full copy of this giant file aggregation" in memory as you put it.  I appreciate your concern about that, because it would be silly to create the entire zip file in memory.  But it does not happen.

When you call Save() the possibly compressed, possibly encrypted content is sent to the stream, which does whatever it wants with it.   In particular, if you call ZipFile.Save(), passing Response.OutputStream as the stream, then DotNetZip will put the zipfile bytes directly into the Response.OutputStream, as they are constructed, and the browser is able to immediately begin downloading the bytes while the compression is happening.  Obviously this can and does happen before the zipfile is fully constructed.  In the general case the browser will be dowloading bytes for an entry within the zipfile before even that particular entry is fully compressed. 

In no case is there "a giant file aggregation" stored in memory on the server.  Ever.  The only time you will have a giant aggregation in memory is if you call Save() to a MemoryStream(), which by definition stores data into memory.  If you for example call Save() with a filename, then the zipped data is streamed to the file, in a manner similar to the streaming to the browser.  Here again, there is no "giant aggregation." 

All of this is documented in the helpfile.  http://cheeso.members.winisp.net/DotNetZipHelp  The JIT streaming mechanism is also fully documented and explained, although now it sounds like you don't need it.  

Regarding CalculateZipfileSize(), that's an interesting idea, and I can see the utility in it.  But getting the size is not possible, without actually performing the compression and encryption.  There's no way to predict, a-priori, how well a particular data stream will compress.  Because you don't know the size until you actually do the compression, you cannot know the size of the zip file until it is created. 

Your efforts to improve DotNetZip are admirable, but I suggest before you devote any more time to improving it, you first spend some time better understanding how it works.

 

Coordinator
Jul 2, 2009 at 6:39 PM
Edited Jul 2, 2009 at 6:39 PM

ps: regarding your desire to manage the chunks that get sent -  see the BufferSize property on the ZipFile. 

However, keep in mind that the stream is very fast, and there are multiple levels of buffering between a typical webserver and browser.  As a result there is no guarantee that there will be a discrete "one chunk at a time" behavior.   The chunk size on the client (browser) will not be the same size as the chunksize on the server.  The BufferSize property allows you to balance memory size versus compression overhead.  For many small files, a smaller buffer is warranted.  If you have large files, a buffersize of 1mb might be better.  There are no hard and fast rules, you'll have to test it. 

And yes, ZipFile.BufferSize is documented, too.

http://cheeso.members.winisp.net/DotNetZipHelp

Jul 5, 2009 at 7:40 AM

Wow! OK!  So if I understand correctly, DotNetZip will do real-time streaming on the fly, by simply saving to OutputStream when done defining,
and no "write()" needs to be done during definition, since no processing is done at that stage anyway.

I'll check to see if it works OK for me, (and starts streaming immediately) and if so will also correct various forums (about sharpZip)
where I followed up on your entry,(I had written to say that your library seemed not to stream on the fly)

 

Thanks! Moshe

Jul 5, 2009 at 12:29 PM

DotNetZip 1.8 - When trying it with small files, it works fine (seemingly).

When replacing the same files with large ones (same filenames so everything is OK "pathwise"),
in IE8 I do not get the "Download file" dialog, but rather I wait a while, with the IE progress bar advancing and it says the page is downloading,
and then finally it stops and I get: Internet Explorer cannot display the webpage. - Obviously a timeout.

How can I cause it to give me the "download file" dialog first, and only then continue streaming while creating the output?

BTW the Calculate() is useful for ForceNoCompression - but that usage IS VERY common.

Thanks again, and will get my company to contribute ...

Moshe

Coordinator
Jul 5, 2009 at 1:45 PM

I'd have to see your server-side code to give you any insight into why zipping large files doesn't work with IE8.  Normally what you describe - getting the download file dialog first - is how it happens.  Have you tried on IE7 or Firefox?

 

Jul 6, 2009 at 9:39 AM

Thanks for the speedy response!!!
Firefox fails in the same way.  Please see remarks at the bottom of this posting.
The server side code is:

 protected void Page_Load(object sender, EventArgs e)
 {
            tryDotNetZip(new ArrayList(){"ErrorPage.htm", "temp1.tmp", "temp2.tmp", "ErrHandler.cs"});
 }


 private void tryDotNetZip(ArrayList filesToInclude)
 {
                Response.Clear();
                string zipName = String.Format("archive-{0}.zip", DateTime.Now.ToString("yyyy-MMM-dd-HHmmss"));
                //const string zipName = "moshe.zip";
                Response.ContentType = "application/zip";
                Response.AddHeader("content-disposition", "filename=" + zipName);
                //Response.Flush();
                using (ZipFile zip = new ZipFile())
                {
                    //zip.BufferSize = 2000;
                    zip.ForceNoCompression = true;

                    foreach (string f in filesToInclude)
                    {
                        string filename = Path.Combine(Server.MapPath("."), f);
                        zip.AddFile(filename, "files");
                    }
                    zip.Save(Response.OutputStream);
                    //Response.Flush();
                }
            Response.End();
        }

_________________________
Please note that all files exist, and that the files temp1.tmp and temp2.tmp are replaced from small text files (with a rapid and successful zip) to large zipfiles,
using a simple "rename" call in DOS.
I also tried changing the names in the ArrayList of the called files to "temp1Big.tmp, temp2Big.tmp"  (with respective files in the directory) but to no avail.

Thanks again, and hope you enjoyed the holiday weekend
(I'm on the other side of the atalantic, reason for the strange hours),
Moshe

Jul 6, 2009 at 10:07 AM

Tried content type application/octet-Sream: Did not help.
(In SharpZipLib when buffering, I got a "corrupt zip file", if I used Application/Zip, and only changing to octet helped me...)

Also tried adding the Response.Flush(); after the header setting (and before using zip). Did not help either.

Thanks, Moshe

Jul 6, 2009 at 10:15 AM

(I don't know how to edit an entry here, so posting another one with a correction)

I meant to say:

Please note that all files exist, and that the files temp1.tmp and temp2.tmp are replaced from small text files (which result in a rapid and successful zipfile output)
to renamed large zipfiles, using a simple "rename" call in DOS.  So now, temp1.tmp and temp2.tmp are very large files (247 MB each)
I also tried changing the names in the ArrayList of the called files to "temp1Big.tmp, temp2Big.tmp"  (with respective files in the directory) but to no avail.

Coordinator
Jul 6, 2009 at 3:26 PM

I read a little more about HTTP and streamed transfer, in RFC 2616, and also in the doc for HttpResponse.  What I learned was that in ASP.NET the http response is, by default, buffered.   If you turn off buffering, you should get a chunked transfer-encoding, which means it will send data as soon as it is written in the server.  Use this:

        Response.BufferOutput= false;  

..directly after Response.Clear() to turn off buffering.

 

 

Jul 8, 2009 at 11:16 PM

Works!!!  Thanks!!!!!

Moshe

Coordinator
Jul 9, 2009 at 12:27 AM

Glad it helped!