Calling Save on Non-Seekable Stream vs. Seekable

Jan 24, 2011 at 1:11 AM

I'm wondering what the diferences are when calling  ZipFile.Save on a non seekable stream vs. seekable stream.

I'm working on an simple asp.net service that dynamical creates and streams a zip archive.   The zip archive itself works fine under windows the files don't seem corrupted, and the zip file seems fine, etc.  However when trying to read the archive from another program as an application package I ran into issues.  The file was just not reading properly.

I've spent hours trying to figure out what the issue was.  Initially I used ZipOutputStream and then ZipFile,  but it just came down to there being a slight difference in the zip archive created by DoNetZip that was causing it to fail.

I finally figured out by outputing to a memory stream first and then sending it to Response.OutputStream completely fixed the problem.

So this works:

var ms = new System.IO.MemoryStream();

using (ZipFile zip = new ZipFile())
{
    //Add some stuff to the archive.
    zip.Save(ms);
}

ms.Position = 0;
ms.CopyTo(Response.OutputStream);
Response.Flush();
HttpContext.Current.ApplicationInstance.CompleteRequest();

While this does not:

using (ZipFile zip = new ZipFile())
{
    //Add some stuff to the archive.
    zip.Save(Response.OutputStream);
}

Response.Flush();
HttpContext.Current.ApplicationInstance.CompleteRequest();

Obviously it's not ideal to load everything in memory, but it seems that there is is something going on when the stream is seekable that doesn't happen when it isn't.  So I'm wondering what the difference is and if there is something I can do to the output stream to fix it.

I know there is a work item open to expose CountingStream as a public type: http://dotnetzip.codeplex.com/workitem/12374

Although I'm not sure if that solves my particular problem or not.

Coordinator
Jan 31, 2011 at 3:21 PM

Two things.

I think the call to ....CompleteRequest() might be incorrect.  You may be more correct in calling Response.Close() after Response.Flush().  Also, Response.End() is an option, but be aware that Response.End() suspends execution of your page at that point. If you have code or logic that follows Response.End(), it will not be executed.  In that case you need to use Response.Close().   The ....CompleteRequest() was something that I put into various samples a long time ago, but since then I've learned that it results in incorrect behavior some of the time.  so I'd recommend against it and if you see an example that still shows ...CompleteRequest(), either in the official DotNetZip documentation or on some website somewhere, let me know and I will try to get it changed.

Ok, independent of that, you ask an interesting question: what's the difference between saving to a seekable vs non-seekable stream.  The original zip format had the unfortunate problem that the entry header provided information about the length of the compressed entry.  An app or library that produces a zip file (like DotNetZip) needed to know the length of the compressed stream, before it could emit the header information into the zip file.  But you can't know the length of the compressed stream until you do the compression, and the header information must appear before the compressed stream in the zip archive.  To address the ordering issue, Most apps and libraries just emitted dummy data for the header, then wrote the compressed stream, then did a seek back to the dummy data, and overwrote with valid header data.  This works fine, but obviously the technique requires a seekable stream.

When producing a zip tool to do piped output, let's say when you want to zip something and then uuencode the result of the zip, using | on the OS command line, seeking back is not possible. So PKWare updated the zip format to allow the length of the compressed stream for each entry to be encoded *after* the compressed stream in the zip archive.  This encoding is an option to the "traditional" encoding, and it is signaled in a bitfield in the zipfile using the 3rd bit - hence it is sometimes called "bit 3 encoding".   Most zip files don't use bit 3 encoding, but some do.  It works fine, but ... not all zip libraries and applications support this modification of the zip format.  This change, by the way, was made in ... I can't remember, but I think it was 1994 or prior.  So this is not a recent thing.  Regardless, some modern tools still don't support bit 3 encoding.  In particular, I understand that the built in archive tool in current versions of Mac OS does not suppport reading or writing zips that use bit 3 encoding. 

DotNetZip supports both the original encoding and bit 3 encoding, but uses bit 3 only when the zipfile is written to a non-seekable stream, in order to maximize interoperability. 

If you are producing a zip file on a web server, it's tempting to just write it out to Response.OutputStream for download to the browser machine. But this isn't foolproof because of limitations on some operating systems and tools.  If you're sure all clients will be Windows clients, it is no problem, because Windows can read bit-3 encoded zip files without a problem (since at least Windows 2000, and maybe earlier).  But if your client list will include other operating systems, then bit-3 encoded zip files may introduce problems.

I hope that clarifies things.

 

Feb 1, 2011 at 1:51 AM

Thanks for the detailed response.

Actually I was going back and forth between using Response.End() and CompleteRequest() based on the discussions on StackOverflow.  I'm not sure what is better at this point :-)  But I guess Response.Close/End is probably more clear if it works just as well.

What I'm doing is creating a silverlight xap on the fly, so apparently silverlight also requires this length information in the header. 

Do you think there would be a way to write this length information manually to the zip stream in order to use the original encoding?  What if you compressed each file beforehand and could determine the compressed length of each entry?  Do you think this information could be written to the header before compressing the files to the output stream?  I suppose I would then have to keep dotnetzip from writing the info at the end of the stream.

Coordinator
Feb 1, 2011 at 5:33 AM

The best way to avoid bit-3 encoding is to use a seekable stream.  A MemoryStream would work, as would a FileStream. 

I think it would be impractical to save to Response.OutputStream, and then somehow doctor up the zip file and move bits around to eliminate the bit-3 encoding.  It's possible to do, but when you write to Response.OutputStream, the data is GONE after you;ve written it.  So there's no chance for your app to intercept and modify it.  If you *could* do that, why not just use a seekable stream, like a MemoryStream, which would give you the same result?

 

Feb 1, 2011 at 6:23 AM

Ideally I would be compressing the stream on the server while the stream is being downloaded to the client so the browser loads the silverlight application as quickly as possible.

The idea would be to create a non-seekable wrapper stream for Response.OutputStream that I would give to DotNetZip that could write it's own header while ignoring any header bytes (and trailing headers?) being sent by DotNetZip.

However now that I think about it that's probably just me over engineering as usual ;-)  I think in the end my Xap files aren't going to be nearly as large as what I'm testing with (so it exaggerates the latency) and most of the time the xap will be cached on the server as well.  So more than likely it's not an issue.  At least nothing I need to worry about right now.

But thanks for the help and feedback.

Coordinator
Feb 2, 2011 at 2:52 PM

Hmm - if you are relying on an application-layer cache, then it could be simple to, for the first request when the cache is empty, just write the XAP file to the cache location on disk (or even into a MemoryStream), and then stream the contents of that cache location to Response.OutputStream.  In this case you have a seekable stream for the initial DotNetZip output - either a filestream or a MemoryStream - and so you avoid the bit-3 encoding problem. 

If you are relying on the kernel-mode cache, then I would say, do the same thing, but after streaming the content to Response.OutputStream, clear the temp location of the zip file.

 

Feb 6, 2011 at 10:45 PM

Hi Cheeso,

Just to chime in a bit here on the HttpApplication.CompleteRequest vs Response.End vs Response.Close issue...

From what I've read, Response.Close sends a connection-reset packet to the client brower which tells the browser to stop receiving any more data on that connection. As a result, if the last few bits of the zip file get delayed while travelling across the network and the reset arrives at the browser first it means that the downloaded zip file will get truncated. There was a post here (http://dotnetzip.codeplex.com/workitem/12466) a while ago which was fixed by switching from Response.Close to using HttpApplication.CompleteRequest for some transatlantic downloads that were affected by this.

Response.End seems to be a bit more friendly, but basically terminates the ASP.NET request at that point so any clean-up code after it won't actually get executed. I don't know if that applies to "finally" blocks or not, or how it deals with flushing any pending buffers.

Overall HttpApplication.CompleteRequest appears to be the cleanest and most controlled way to end the download request, but I could have missed something important.

There's some links here in the thread http://dotnetzip.codeplex.com/Thread/View.aspx?ThreadId=238541 that reference some other articles I found that contain some stuff I found useful.

Hope this helps,

Mike

Coordinator
Feb 14, 2011 at 7:26 PM

Thanks, pointyMike, that's useful.