Problem with creating zip file from stream content

Aug 1, 2011 at 1:57 PM
Edited Aug 1, 2011 at 2:16 PM

Hello.

Consider this code

using (MemoryStream stream = new MemoryStream())
{
    using (BinaryWriter writer = new BinaryWriter(stream))
    {
        using (ZipFile zip = new ZipFile())
        {
            writer.Write("test-content");

            zip.AddEntry("test-bin", stream);

            zip.Save("test.zip");
        }
    }
}

The Result is zip file with 'test-bin' entry. This entry has null length!

Consider the next code

using (MemoryStream stream = new MemoryStream())
{
    using (BinaryWriter writer = new BinaryWriter(stream))
    {
        using (ZipFile zip = new ZipFile())
        {
            writer.Write("test-content");

            stream.Position = 1; //<<<<< changes

            zip.AddEntry("test-bin", stream);

            zip.Save("test.zip");
        }
    }
}

I returned stream's position to 1 and 'test-bin' entry became correct and has valid length.

In my opinion method AddEntry has to define length of stream and read stream with first symbol. Are you agree?

Coordinator
Aug 1, 2011 at 8:41 PM

No, I am not agree.

In some cases people use streams as inputs for ZipEntry, that have "something" in the beginning.  For example imagine a NetworkStream that the application begins reading from.  At some point the application decides that it wants to insert further input from the stream into the ZipFile, as a ZipEntry.  The current behavior of ZipFile.AddEntry() allows that.  The same thing applies to a stream obtained from a database server. You can imagine other cases.  In support of that scenario, I do not believe it is appropriate for AddEntry() to reset the position of a stream.

Secondly, there are some streams that do not support Seek(), and obviously AddEntry() cannot set the position of these streams. Using your proposed change, AddEntry() would Seek() when the stream supported it, and not Seek() when the stream did not support it. (similarly, some streams support a Length property, and some do not.) This proposed behavior of AddEntry() would be an arbitrary behavior, not based on any real design principle. (Just a comment on my approach to building software: When I build these methods, I write the documentation at the same time. If I cannot clearly explain what the method does, and why, then I am inclined to reconsider the behavior of the method. If I can explain it succintly, then the method is probably well-conceived.)

So what I have done is define AddEntry() to simply read the stream. It is the caller's responsibility to provide the stream, and configure the stream correctly before ZipFile.Save() is called.  That means, open the stream, seek if necessary.   Simple to explain.  In fact I just checked the documentation on this method, and it says this:

The passed stream will be read from its current position. Callers should set the position in the stream before calling AddEntry().

----

One note - I believe you want stream.Position = 0;, not stream.Position = 1; 

 

Aug 2, 2011 at 7:06 AM

Thanks for your answer. I became to understand work of DotNetZip better.

I set stream.Position = 1 because of there is not valid symbol for stream.Position = 0. Look at the picture. 

Бесплатный хостинг для хранения изображений
Coordinator
Aug 2, 2011 at 4:01 PM

That thing is a BOM.

According to the MSDN documentation, when constructing a BinaryWriter via the constructor you have used, the instance uses UTF-8 for encoding of the strings. That implies a BOM.

Backing up, when using a MemoryStream this way, as a source for an entry to be compressed into a zip file, you must reset the stream position like this:

stream.Position = 0;

or, like this:

stream.Seek(0,SeekOrigin.Begin);

This will include the BOM into the stream to be compressed; it requires that the reading application correctly handle the BOM when reading the decompressed content.

To avoid emitting the BOM, you must explicitly specify a non-unicode Encoding for the BinaryWriter, or explicitly specify a unicode Encoding and set the Encoding to not emit the preamble.  The latter approach is more fragile as it requires the reading application to "know" the Encoding used on compression.  Seeking to Position=1 (thereby jumping over the BOM) is the wrong thing to do; it will lead to errors in non-trivial cases. All of this has nothing to do with DotNetZip - it is strictly a unicode issue.  If you don't understand what I've written here, then you should read up on Unicode

Aug 3, 2011 at 7:49 AM

Cheeso best  thanks. I understood clearly.