Need more efficient handling of streams when saving ZipFiles

Mar 1, 2009 at 10:35 AM
Hi guys,

This library seems to do pretty much what I need but I'm wondering if the next thing is possible with it:

I have a hundred files in my database which I would like to stream into a zip file. But in the remarks I read that if I would like to add a file as a stream, my stream must remained open until the save method is called. This means I need to open 100 simultanious connections to my database for just one request actually. Not only my connection pool is drained out, but I'm getting table locks instead of row locks eventually.

Is it possible to stream contents to the zipfile immediately, without waiting for the Save method being called? Cause than I can do with just one connection.

Cheers,
Wes

Just to let you know: I have it working in this way with CSharpZipLib but I would like to have it ported to use the DotNetZip Lib because of the GPL license. Below is the sample of how I do this right now. As you can see I open a binarystream, write the contents to the ZipOutPutStream right away and the binarystream is closed directly.
/// <summary>
/// Adds the SPFile to the zip output stream.
/// </summary>
/// <param name="zipOutputStream">The zip output stream.</param>
/// <param name="file">The file.</param>
/// <param name="path">The path.</param>
public static void AddSPFile(this ZipOutputStream zipOutputStream, SPFile file, string path) {
    string filePath = Path.Combine(path, file.Name);

    ZipEntry entry = new ZipEntry(filePath);
    entry.DateTime = file.TimeCreated;
    zipOutputStream.PutNextEntry(entry);

    using (Stream fileStream = file.OpenBinaryStream()) {
        byte[] buffer = new byte[4096];
        int sourceBytes;
        do {
            sourceBytes = fileStream.Read(buffer, 0, buffer.Length);
            zipOutputStream.Write(buffer, 0, sourceBytes);
        } while (sourceBytes > 0);
    }
}
Coordinator
Mar 1, 2009 at 3:17 PM

Webbes,

I understand your situation.  I have to think about it a little.  Just at the outset, I have a couple ideas.

You CAN stream the contents of the zip entry immediately, as soon as you add the entry.  To do this, you would call Save() immediately after adding each file or stream.    This would cause the current stream being added to be read from the database, and then written to the zip archive on disk.  On subsequent calls to Save(), the same thing happens with the current entry, but the data for the previously-written entries is just transferred from one zip file to another.  This seems like it would work, avoiding the 100 simultaneous connections to the database, though of course, each file saved to the zip gets written and rewritten, (n-i) times.  The first one, 100 times. The second entry 99 times, etc.   Seems very inefficient.  But it would avoid the simultaneous db connections problem.

A real solution to your problem would eliminate the simultaneous db connections while at the same time not produce any redundant data writes.
There are a couple ways I could address this.

  • Key off the entry events, Saving_BeforeWriteEntry and Saving_AfterWriteEntry.   An app could register a progress handler, and when it receives this event, which includes the name of the entry about to be written, the app could then open the stream for that particular entry and hand it to the library.  Then, in the After event, the app could close the stream for that entry.  Currently, this WON'T REALLY WORK, because the event model does not allow the app to communicate back to the library a stream for the ZipEntry to be saved to the archive.  But it wouldn't be hard to introduce this new capability.
  • Take the approach you suggest:  write the data immediately.  This would be a higher-impact change, and it would take much longer to write and test, because it changes the model for how the ZipFile class works internally.   I see what you mean - it could be very convenient for your use case; but given the design of ZipFile right now, it would be harder to do.

There may be other ways, too.

What do you think of the first option?

Coordinator
Mar 1, 2009 at 3:18 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Mar 1, 2009 at 3:46 PM

The solution with events would work and I see you've created a workitem for it. Maybe I'll see if i can do something with it myself.

The first option is not really an option. I can't create 100 zipfiles before I send one. It would be to resource intensive. My solution works in a SharePoint enterprise portal and must be very well performing.

Cheers,

Wes

Coordinator
Mar 1, 2009 at 4:48 PM
Edited Mar 1, 2009 at 4:50 PM
ok, makes sense.  Well, have a look at the event code.
I haven't looked to see if this would work, but I am thinking: 
  • add a Stream param to the ProgressEventArgs type
  • in your own progress event handler, open the stream and set the stream param in the EventArgs type
  • check the stream in the ZipFile class upon completion of the Saving_BeforeSaveEntry event
  • if the stream is non-null, then copy that value to the ZipEntry._sourceStream private/internal member. (It needs to be marked internal for this to work, not sure if it already is)
  • then, in ZipEntry.Write(), the logic already uses the _sourceStream
  • also in your application code, close the stream. 

I think there would be some loose ends to clean up, but that would be the basic idea.

Mar 3, 2009 at 8:34 AM

 

I tried the reverse: delay opening the file like in 

 

zip.AddFileStream(fileName,

string.Empty, new LazyStream(() => catalog.Open(fileName)));

I expected the Save method to close the file, but it is not the case and now I can't close it because I have no Stream property on the ZipEntry

with the following LazyStream class (the delegate is the same as Func<Stream> but can be used with .net 2.0)


public

 

delegate Stream StreamOpener();

 

 

public class LazyStream : Stream

 

{

 

private StreamOpener Opener;

 

 

private Stream BaseStream;

 

 

public LazyStream(StreamOpener opener)

 

{

 

if (opener == null)

 

 

throw new ArgumentNullException();

 

Opener = opener;

}

 

public override bool CanRead

 

{

 

get { return true; }

 

}

 

public override bool CanSeek

 

{

 

get { return true; }

 

}

 

public override bool CanWrite

 

{

 

get { return false; }

 

}

 

public override void Flush()

 

{

 

throw new NotImplementedException();

 

}

 

public override long Length

 

{

 

get

 

{

 

if (BaseStream == null)

 

BaseStream = Opener();

 

return BaseStream.Length;

 

}

}

 

public override long Position

 

{

 

get

 

{

 

if (BaseStream == null)

 

 

return 0;

 

 

return BaseStream.Position;

 

}

 

set

 

{

 

if (value != 0)

 

 

throw new ArgumentOutOfRangeException();

 

 

if (BaseStream != null && BaseStream.Position != 0)

 

{

BaseStream.Close();

BaseStream =

null;

 

}

}

}

 

public override int Read(byte[] buffer, int offset, int count)

 

{

 

if (BaseStream == null)

 

BaseStream = Opener();

 

if (!BaseStream.CanRead)

 

 

throw new InvalidOperationException();

 

 

int n = 0;

 

 

int m;

 

 

while (count != 0 && (m = BaseStream.Read(buffer, offset, count)) != 0)

 

{

offset += m;

n += m;

count -= m;

}

 

return n;

 

}

 

public override long Seek(long offset, SeekOrigin origin)

 

{

 

if (origin == SeekOrigin.Current)

 

{

 

if (offset != 0)

 

 

throw new ArgumentOutOfRangeException();

 

 

if (BaseStream == null)

 

 

return 0;

 

 

return BaseStream.Position;

 

}

 

if (origin != SeekOrigin.Begin)

 

 

throw new ArgumentOutOfRangeException();

 

 

if (offset != 0)

 

 

throw new ArgumentOutOfRangeException();

 

 

if (BaseStream != null)

 

BaseStream.Close();

BaseStream =

null;

 

 

return 0;

 

}

 

public override void SetLength(long value)

 

{

 

throw new NotImplementedException();

 

}

 

public override void Write(byte[] buffer, int offset, int count)

 

{

 

throw new NotImplementedException();

 

}

 

public override void Close()

 

{

 

if (BaseStream != null)

 

BaseStream.Close();

}

}

Coordinator
Mar 3, 2009 at 1:44 PM

Hmm, interesting.
But yes, DotNetZip does not close streams that are presented to it from external sources.

I have to think about this some more.

Coordinator
Mar 4, 2009 at 7:17 PM
Edited Mar 4, 2009 at 7:24 PM

Hey Webbes,
I thought about this some more and came up with this idea. It introduces a new interface into DotNetZip - IStreamDispenser - which exposes just two methods:  Open and Return.   DotNetZip calls the Open method when it needs a stream and calls the Return when it is done. 
You use it this way:

void IStreamDispenser.Return(System.IO.Stream s, string name)
{
    Console.WriteLine("closing {0}", name);
    s.Close();
}


System.IO.Stream IStreamDispenser.Open(string name)
{
    Console.WriteLine("opening {0}", name);
    System.IO.Stream s = System.IO.File.OpenRead(System.IO.Path.Combine("fodder",name));
    return s;
}


public void Run()
{
    using (ZipFile zip = new ZipFile())
    {
	foreach (var f in System.IO.Directory.GetFiles(DirectoryToZip))
	    zip.AddFileStream(f, "", null);
	zip.StreamDispenser = this;
	zip.Save(ZipArchiveName);
    }
}

The StreamDispenser methods are called once for each entry, and only within the scope of the ZipFile.Save(), and only when the stream you have passed origially is null.  

Couple questions:

  1. Would this work for you?
  2. Can you see a simpler or more elegeant way to do this?  
    1. One option is to replace the interface with a simple delegate [ Stream GetStream(string name) ];  again it gets called just-in-time (when saving) when the input stream is null.  Then I'd modify DotNetZip to automatically call Close() and Dispose() on the stream within when it is finished writing that entry.  I don't like this option so much because in some cases the stream to be read is a single stream that contains multiple entries, and the application just seeks around to get to the right spot.  Closing that stream would be the wrong thing.   [Actually - not sure this is a valid concern.  If necessary the app could wrap the original stream with a shim that does nothing on Close()]
    2. Another option is to use a delegate as above, but also add a bool property "AutoClose" to make the Close/Dispose behavior optional and explicit. 

If you want to try out the StreamDispenser, grab v1.8.1.9 .
And I'm interested in your comments on the design.

Mar 5, 2009 at 8:54 AM
Edited Mar 5, 2009 at 9:23 AM

This could definately work for me, and I'm happy with this option already.

I would prefer events though. I would create something like this (if I had the time to really get into the library. I don't know if I can really access the underlying Stream of a ZipEntry for example) :

Additions to the zip class:

public event EventHandler<StreamSavingEventArgs> SavingStream;
public event EventHandler<StreamSavingEventArgs> SavedStream;

protected virtual void OnSavingStream(StreamSavingEventArgs eventArgs){
   EventHandler<StreamSavingEventArgs> temp = SavingStream;
   if(temp != null){
      temp(this, eventArgs);
   }
}

protected virtual void OnSavedStream(StreamSavingEventArgs eventArgs){
   EventHandler<StreamSavingEventArgs> temp = SavedStream;
   if(temp != null){
      temp(this, eventArgs);
   }
}

//I imagine there's something like a SaveStream method
private void SaveStream(ZipEntry zipEntry){
   StreamSavingEventArgs eventArgs = new StreamSavingEventArgs(zipEntry);
   OnSavingStream(eventArgs);
   
   //all other code that actually writes steam goes in here

   //note: reuse the eventargs so I can access my State object again
   OnSavedStream(eventArgs);
}

One extra class:

public class StreamSavingEventArgs : EventArgs {
   public StreamSavingEventArgs (ZipEntry zipEntry) {
       this.ZipEntry = zipEntry;
   }

   public object State { get; set; }

   public ZipEntry ZipEntry { get; private set; }
}

With this I could do whatever I want:

  • I could use only the SavingStream to open a stream:
  • I could use only the SavedStream to close a stream:
  • I could use the combination of them to open a stream on the beginning, add some object to the State property of my StreamSavingEventArgs, and then close the stream.
public void Run()
{
    using (ZipFile zip = new ZipFile())
    {
	//add streams here
	zip.SavedStream += (o,e)=>e.ZipEntry.Stream.Dispose();
	zip.Save(ZipArchiveName);
    }
}

The State property is optional but I can imagine a use where people would like to have an object in the end that they've created in the beginning.

You could even make it more generic by creating SavingEntry and SavedEntry events that get called for each ZipEntry that's about to get saved or has just been saved. An enumerator could be used to check the type of entry.

public void Run()
{
    using (ZipFile zip = new ZipFile())
    {
	//add streams here
	zip.SavedEntry += (o,e)=>{if(e.ZipEntry.EntryType == ZipEntryType.Stream){e.ZipEntry.Stream.Dispose();}}
	zip.Save(ZipArchiveName);
    }
}

That would probably be my choice. I don't know the library that well though and I don't know if it's hard to implement so I'm happy with your solution as is already.

Thanks,

Wes

Mar 5, 2009 at 11:25 AM
I would prefer something with delegates or events, but not an interface: using anonymous methods or lambdas is so easy.
Coordinator
Mar 5, 2009 at 3:08 PM
Edited Mar 5, 2009 at 3:11 PM

Hey, thanks for the input.
you know, I was halfway to implementing this feature with a matched pair of delegates, when I changed course and went to an interface. I had read something that said delegates are sort of an interface with a single method.  And it felt not quite right to introduce 2 new delegates for this one new feature.
But I can see your point.

There already is a SaveProgress event, and there's an emum that describes the "save state".  Two of the states are: BeforeWriteEntry  and AfterWriteEntry.  These are the points in time when the library would need a stream to be opened and closed.  At first I was thinking I could stuff this capability into that SaveProgress event, and even re-use those save states....but that didn't feel right either.  Overloading the progress event to enable opening and closing streams... it felt like I was hiding the feature. 

I could do this with a single new event and a new EventArgs type.  Maybe call it StreamDispenserEventArgs.  The point with that name is to distinguish it from the SaveProgressEventArgs that already exits.  (SavingStreamEventArgs feels to similar).   In the EventArgs type I could insert an enum that tells the app if the library is requesting a stream be opened, or closed (or whatever).  And this event would be called not for every entry, but only for the entries with null (unspecified) streams.   What do you think of that?

or do you really want two distinct events here?

None of this is hard to implement in any case - the external interface is what we're talking about, and the internal mechanics are already there.

 

Mar 5, 2009 at 4:24 PM
Cheeso,

I would really go for the way I described above with the two events. Simply because it's according to all framework design guidelines and gives me full flexibility.

Tmho every class that ends with EventArgs looks like other classes that end with EventArgs. But it's a design guideline which helps me a lot in finding the correct eventargs for my events and in explaining what they should be used for.

Same goes with the TWO events instead of one. It's simply a mather of design guidelines which I'm pretty happy with. It's symetric code and makes sense to most developers right away. As soon I find an event called 'Verbing' I know there might be an event called 'Verbed'. I know that the first one will be triggered on the start of the action and the latter when it's finished.

But again I'm happy with everything you provide. These opinions of mine are definately not sacred.

Cheers,

Wesley
Coordinator
Mar 5, 2009 at 4:45 PM

Wesley, what you say makes sense.
just one more thing -
the name of the events.  I get the rationale behind Verbing and Verbed. You proposed SavingStream and SavedStream, which makes sense to me if it is a geenral purpose event

But there is already a SaveProgress event, in which there is a BeforeSaveEntry and AfterSaveEntry.  That event is a general purpose notification event.

OTOH, There's a distinct and more specific purpose for this pair of events we're discussing - to obtain and later dispose streams on a just-in-time basis. And I'm suggesting that the name should more clearly communicate that specific purpose. Also, to go along with that I'm suggesting that the operation and behavior also reflect the purpose - these events would be called only when needing a stream to be opened just-in-time, rather than being called for every entry being saved.  In other words it's not really "SavingStream", it's "ObtainStream".   We already have a something equivalent to SavingStream - it is in the SaveProgress event. By the same token, it's not really "SavedStream", it's "AllDoneWithYourStream".

What do you think?

Those are not the right names, but maybe you get my point: Saving and Saved seem not quite right.

Coordinator
Mar 5, 2009 at 4:51 PM
As soon as I wrote that I considered another usage scenario, from another thread on this forum, where the application was removing the file-to-be-zipped before it was completely zipped up. 
So now I'm thinking that an additional pair of general purpose events, as you suggest, SavingStream and SavedStream, might be the right idea.  This would work for your scenario - where the app provides a stream just for as long as the library needs it - and it would also work for the other scenario, too, where someone wants to remove a file.
 
ok, that raises a new question - why not just overload the SaveProgress event with this?  If it is general purpose, then there are already events firing at those moments, which could be used for this purpose. This was my original idea, which I then moved away from because it felt incorrect to overload the SaveProgress event with the new capability. 

Mar 5, 2009 at 5:39 PM

While browsing a little bit though the help file it's all getting a little bit clearer for me. Actually the functionally is in there allready IF the stream property of a ZipEntry could be reached.

public static void SaveProgress(object sender, SaveProgressEventArgs e)
{
    if (e.EventType == ZipProgressEventType.Saving_BeforeWriteEntry)
    {
        e.CurrentEntry.Stream = //Open your stream;
    }
    else if (e.EventType == ZipProgressEventType.Saving_AfterSaveEntry)
    {
       e.CurrentEntry.Stream.Dispose();
    }
}

So maybe all you need to do is to make stream a reachable property.

Cheers,

Wes

Coordinator
Mar 5, 2009 at 6:58 PM

yes, exactly.
And by your making the suggestion, I am guessing you feel that such an extension does not overly complexify the SaveProgress event? 

 

Mar 5, 2009 at 9:10 PM
No, I could definately work with this. It's just that I didn't expect the ZipEntry to be accessible from the progress event...
Coordinator
Mar 5, 2009 at 9:16 PM
ok, give me a moment to run some tests and I will post the updated library with this change.
Coordinator
Mar 5, 2009 at 10:07 PM

Ok, the v1.8.1.11 library has the update where you can set the stream just-in-time.
To use it you will code a SaveProgress event like so:

        public static void SaveProgress(object sender, SaveProgressEventArgs e)
        {
            if (e.EventType == ZipProgressEventType.Saving_BeforeWriteEntry)
            {
                if (e.CurrentEntry.Source == Ionic.Zip.ZipEntry.EntrySource.Stream &&
                    e.CurrentEntry.InputStream == null)
                {
                    Console.WriteLine("opening {0}", e.CurrentEntry.FileName);
                    System.IO.Stream s = MyStreamOpener(e.CurrentEntry.FileName);
                    e.CurrentEntry.InputStream = s;
                }
            }
            else if (e.EventType == ZipProgressEventType.Saving_AfterWriteEntry)
            {
                if (e.CurrentEntry.InputStreamWasJitProvided)
                {
                    Console.WriteLine("closing {0}", e.CurrentEntry.FileName);
                    e.CurrentEntry.InputStream.Close();
                    e.CurrentEntry.InputStream.Dispose();
                }
            }
        }

I put some extra properties in there to make it more convenient to use. It may be that people want to create a zip file using input from mixed sources - some from the filesystem and some from streams. In that case the library can provide some additional help to the application to help the app decide (a) whether it needs to open a stream in the BeforeWriteEntry event, and (b) whether it needs to Close/Dispose in the AfterWriteEntry event.

Oct 2, 2009 at 2:02 PM

For those who would miss the ZipOutputStream of SharpZipLib, here is a simple code that makes it possible to use DotNetZip the "regular .NET stream way".

However, take note that it is inefficient compared to a real on-the-fly streaming solution like SharpZipLib does, as it uses an internal MemoryStream before actually calling the DotNetZip.Save() function.

But unfortunately, SharpZibLib does not sports EAS Encryption yet (and certainly never as the project seems abandonned now).

Let's hope that Cheeso will finally add this functionality in dotNetZip any time soon ? ;-)

 

    /// <summary>
    /// DotNetZip does not support streaming out-of-the-box up to version v1.8.
    /// This wrapper class helps doing so but unfortunately it has to use
    /// a temporary memory buffer internally which is quite inefficient
    /// (for instance, compared with the ZipOutputStream of SharpZibLib which has other drawbacks besides).
    /// </summary>
    public class DotNetZipOutputStream : Stream
    {
        public ZipFile ZipFile { get; private set; }

        private MemoryStream memStream = new MemoryStream();
        private String nextEntry = null;
        private Stream outputStream = null;
        private bool closed = false;

        public DotNetZipOutputStream(Stream baseOutputStream)
        {
            ZipFile = new ZipFile();
            outputStream = baseOutputStream;
        }

        public void PutNextEntry(String fileName)
        {
            memStream = new MemoryStream();
            nextEntry = fileName;
        }

        public override bool CanRead { get { return false; } }
        public override bool CanSeek { get { return false; } }
        public override bool CanWrite { get { return true; } }
        public override long Length { get { return memStream.Length; } }
        public override long Position
        {
            get { return memStream.Position; }
            set { memStream.Position = value; }
        }

        public override void Close()
        {
            if (closed) return;

            memStream.Position = 0;
            ZipFile.AddEntry(nextEntry, Path.GetDirectoryName(nextEntry), memStream);
            ZipFile.Save(outputStream);
            memStream.Close();
            closed = true;
        }

        public override void Flush()
        {
            memStream.Flush();
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            throw new NotSupportedException("Read");
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotSupportedException("Seek");
        }

        public override void SetLength(long value)
        {
            memStream.SetLength(value);
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            memStream.Write(buffer, offset, count);
        }
    }

 

 

 

    /// <summary>
    /// DotNetZip does not support streaming out-of-the-box up to version v1.8.
    /// This wrapper class helps doing so but unfortunately it has to use
    /// a temporary memory buffer internally which is quite inefficient
    /// (for instance, compared with the ZipOutputStream of SharpZibLib which has other drawbacks besides).
    /// </summary>
    public class DotNetZipOutputStream : Stream
    {
        public ZipFile ZipFile { get; private set; }
        private MemoryStream memStream = new MemoryStream();
        private String nextEntry = null;
        private Stream outputStream = null;
        private bool closed = false;
        public DotNetZipOutputStream(Stream baseOutputStream)
        {
            ZipFile = new ZipFile();
            outputStream = baseOutputStream;
        }
        public void PutNextEntry(String fileName)
        {
            memStream = new MemoryStream();
            nextEntry = fileName;
        }
        public override bool CanRead { get { return false; } }
        public override bool CanSeek { get { return false; } }
        public override bool CanWrite { get { return true; } }
        public override long Length { get { return memStream.Length; } }
        public override long Position
        {
            get { return memStream.Position; }
            set { memStream.Position = value; }
        }
        public override void Close()
        {
            if (closed) return;
            memStream.Position = 0;
            ZipFile.AddEntry(nextEntry, Path.GetDirectoryName(nextEntry), memStream);
            ZipFile.Save(outputStream);
            memStream.Close();
            closed = true;
        }
        public override void Flush()
        {
            memStream.Flush();
        }
        public override int Read(byte[] buffer, int offset, int count)
        {
            throw new NotSupportedException("Read");
        }
        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotSupportedException("Seek");
        }
        public override void SetLength(long value)
        {
            memStream.SetLength(value);
        }
        public override void Write(byte[] buffer, int offset, int count)
        {
            memStream.Write(buffer, offset, count);
        }
    
Coordinator
Oct 2, 2009 at 5:11 PM

> Let's hope that Cheeso will finally add this functionality in dotNetZip any time soon ? ;-)

What functionality is it, that you seek?

Are you asking for a different programming model?  You don't like it that DotNetZip must always Read the file or stream for content, in order save a zip file?

>  makes it possible to use DotNetZip the "regular .NET stream way".

I'm do not see a clear benefit from this approach.   It's not a "regular stream".  You have these extra methods - PutNextEntry() and so on, so a proposed DotNetZipOutputStream would not be usable in the same way any other stream is usable. 

One of the things I tried to do with DotNetZip even at the beginning is provide a metaphor that makes sense for zip files.  That is the ZipFile class.  Using a stream metaphor for the ZipFile seemed like a bad idea to me.  It made the programming model more complicated.  I still don't see the utility in it.

Can you explain it to me?

Oct 2, 2009 at 6:20 PM

I expect it's about an interface where the user can Write to a stream, not having the library Reading from a stream supplied by the user.

When the Save method is called, and when it is time to get the bytes for a given entry, it is also natural to let a callback just write the bytes (e.g. from a DataSet)

But I find very aggressive to speak of DotNetZip that way; it is great, and is more than maintained. I thank you Cheeso for that.

Coordinator
Oct 2, 2009 at 7:23 PM

Thanks GM.

Yes, I agree, the phrasing ("finally add this functionality") may seem a bit aggressive, but I didn't take it that way.   I do try to maintain the library, and I haven't seen any workitems covering this idea.  I know I am not dragging my feet on this.

What you say makes sense - allow the user to write to a stream that results in a ZipEntry in the zip file.    I would be happy to discuss the idea to figure out a way to satisfy.  In fact, just yesterday, someone came on the forums with this requirement.  He wanted to save a DataSet as a zipentry in a zipfile, but he did not want to save the dataset to a file, first.  The DataSet has a WriteXml() method, which writes to a stream.  But DotNetZip cannot zip up the contents of a stream provided this way.  My solution for that user was an AdapterStream.  It works, but is not optimal.  It requires the use of 2 threads, so the programming model is not so simple.

I think there may be a way to allow this, without modifying the rest of the DotNetZip interface too heavily.   I'll have to think about it.

My first idea is to have a callout from the library to the user's application, providing a stream to which the user must write.   It might mean extending ZipEntrySource to include another option.

Another idea is to introduce a new class in the way envisioned by the original poster.  This would provide a completely alternative interface to DotNetZip.  Everything would have to be duplicated - the events, all the ZipFile properties, and so on.

 

Coordinator
Oct 2, 2009 at 8:16 PM

I'm considering the programming model that would be enabled with this change.

DataSet dataset1 = new DataSet();
dataset1.Fill(...);
using (ZipOutputStream zipos = ZipFile.OpenOutputStream(stream))
{
    zipos.CompressionLevel = Ionic.Zlib.CompressionLevel.Best; 
    zipos.Encryption = Ionic.Zip.EncryptionAlgorithm.AES256;
    zipos.Password = "VerySecret!";

    zipos.PutNextEntry(...);     
    dataset1.WriteXml(zipos);  

    zipos.PutNextEntry(...); 
    using (Stream otherStream = OpenReadableStream())
    {
        int n;
        while ((n= otherStream.Read(buffer, 0, buffer.Length))>0)
        {
            zipos.Write(buffer,0,n);
        }
    }
}

Is that about right?   or...are you thinking something else?

I suppose within PutNextEntry one would want to be able to set the encryption, compression level, codepage, and so on? Or, would it be ok to keep all of those properties on the ZipOutputStream class?

I'm still trying to envision what it would look like.  My first reaction is: such a ZipOutputStream class looks very redundant with the existing ZipFile class.  I understand that it is similar to JarOutputStream in Java, and ZipOutputStream in SharpZipLib.  But that fact alone is not a good reason to implement it in DotNetZip.

 

 

Coordinator
Oct 2, 2009 at 9:06 PM

Or maybe a better interface is:

using (var zip = new ZipFile(output))
{
    zip.Encryption= EncryptionAlgorithm.WinZipAes256;
    zip.Password = "whatever";

    using (ZipOutputStream zipos = zip.OpenOutputStream())
    {
        zipos.PutNextEntry(...); // name, other attributes    
        dataset1.Write(zipos); 

        zipos.PutNextEntry(...); // name, other attributes    
        using (Stream otherStream = OpenReadableStream())
        {
            int n;
            while ((n= otherStream.Read(buffer, 0, buffer.Length))>0)
            {
                 zipos.Write(buffer,0,n);
            }
        }
    }
}

In this case the ZipFile instance can dispense a ZipOutputStream, which can then be used as a regular, writable stream. Each PutNextEntry() starts a new entry. There is no Save() necessary on the ZipFile instance, because it would be written as the ZipOutputStream is written, and the zip file would be terminated when the ZipOutputStream is closed. 

As an alternative I don't think a constructor like this: 

using (Stream x = new ZipOutputStream(new ZipFile(output)))
{
   ...
}

...which inverts the construction, would make sense.  The problem here is that the use of the ZipOutputStream changes the behavior    of the ZipFile.  Using ZipOutputStream means the zipfile is being written as calls to ZipOutputStream.Write() are made, which does not happen in the existing DotNetZip interface.  Therefore, constructing a ZipOutputStream based on a ZipFile seems wrong.  It would have to violate encapsulation in order to work properly.   On the other hand, it seems ok to me, to change the state of the ZipFile, when using a method on the ZipFile to dispense a ZipOutputStream, as in the first code snip.

Creating a ZipOutputStream is created from the ZipFile, and providing the expected semantics, would require some significant changes within the ZipFile class. While it wouldn't introduce redundancies with the existing interface, it would introduce some complexity and new pitfalls.

For example, intermixing calls to ZipFile.AddFile() with calls to ZipFile.OpenOutputStream() would introduce some twists. Using ZipFile.OpenOutputStream on a zipfile with entries in it, would cause all of the existing entries to be written/saved before the call returned.  And so on.  Nothing impossible to solve.

Also, is PutNextEntry really the right name for the method on the ZipOutputStream?  How about PrepareNextEntry or StartNextEntry?  PutNextEntry does not express what that method is doing.

 

Oct 3, 2009 at 12:50 PM

What would be convenient for me would be to have a constructor of ZipEntry (or a property on ZipEntry) to supply my own delagate of type Action<Stream> (or any other similar delegate if you don't want to depend on .Net 3.5).

This delegate would do writes on the supplied stream, at the time the ZipEntry must be processed (inside the processing of ZipFile.Save).

You have a Deflate implementation that would allow compression on the fly, and just have to pass to the delegate a Stream implementation that only allows Write (no reads, no seek) to the underlying file, and doing noting when closed/disposed.

Oct 3, 2009 at 1:34 PM

In fact I expect that internally you are reading a Stream only to write the bytes on a deflating stream(depending on CompressionMethod). Let us access that level. Reading comes from the use case of "copying" a file to the zip.

You could internally supply a delegate for the existing cases, or let the user supply its own. The model of providing streams on an event, with another event to close it, is finally not convenient to use.

Coordinator
Oct 3, 2009 at 7:52 PM

yes, GM, you're right about the design.  It reads the source stream only to write it to the DeflateStream.

I like the idea of a delegate, I'll have to think about modifying the design to allow that.

Up til now I have not depended on .NET 3.5.    I don't know if I want to change that for DotNetZip v1.9.

Coordinator
Oct 3, 2009 at 8:14 PM

> The model of providing streams on an event, with another event to close it, is finally not convenient to use.

You know, I thought about this for a while.  How to provide a way to provide the stream just-in-time.   I thought of providing 2 new delegate types - remember dotnetzip does not use .net 3.5, so I must define new types.  This would mean one delegate type to open the stream and one delegate to close it.  But then in the feedback I got, it seemed like too many new interfaces. 

I could change it, but I am still not really happy with the delegate option for opening and closing the stream. 

If I provide a delegate option for writing the stream ... then it makes sense to also provide a delegate to open+close the stream.   Those would be exclusive alternatives, of course. When the app provides a delegate to open the stream, it is DotNetZip calling Read on that stream (and never Write).   On the other hand, when the app provides a delegate to Write the zip output stream, then DotNetZip wouldn't ever Read from an application-provided stream.  Even though they are never used together, it would make sense to have them be consistent.

 

Coordinator
Oct 4, 2009 at 12:22 AM
Edited Oct 4, 2009 at 12:25 AM

Rather than providing a constructor, I think for consistency such an entry ought to be created with an overload of the AddEntry() method on the ZipFile class.

The new method should accept a delegate, which is defined like this:

public delegate void WriteDelegate(string entryName, System.IO.Stream stream);

Here's the programming model, if you want to save a DataSet to a zip file.

DataSet ds1 = new DataSet();
da.Fill(ds1, "Invoices");

using(Ionic.Zip.ZipFile zip = new Ionic.Zip.ZipFile())
{
    zip.AddEntry(zipEntryName, (name,stream) => ds1.WriteXml(stream) );
    zip.Save(zipFileName);
}

Or, for writing arbitrary data into the zip: 

using (FileStream input = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite ))
{
    using(Ionic.Zip.ZipFile zip = new Ionic.Zip.ZipFile())
    {
        zip.AddEntry(zipEntryName, (name,output) =>
            {
                byte[] buffer = new byte[BufferSize];
                int n;
                while ((n = input.Read(buffer, 0, buffer.Length)) != 0)
                {
                    output.Write(buffer, 0, n);
                    // could update a progress bar here
                }
            });

        zip.Save(zipFileName);
    }
}


What do you think?

Coordinator
Oct 4, 2009 at 12:36 AM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Coordinator
Oct 4, 2009 at 6:52 AM

ok, I implemented this change, in changeset 44426.

download v1.9.0.12 to try it out.

 

Oct 5, 2009 at 2:47 PM

First, I've got to apologize. Re-reading myself, I was definitly rude !

Sorry. You make a wonderful job and I'm thankful for that.

To clarify my point, I'm trying to use your library to receive and send XML data through a WCF service. Depending on metadata, the exchanged XML data are either compressed as a single-file zip archive or a deflated stream or just plain text.

Using streams offers me to read and write the XML without worrying if it is compressed in a way or another or not at all. I'm just opening the correct chain of streams once at the beginning and then I can reuse the same path code that uses regular XmlTextReader/Writers to do the job. That's why the "stream" pattern suits me and why I'd like to use it with your library.

However, I'm facing two issues using streams this way:

- while writing, using the DotNetZipOutputStream, I'm running high on memory consumption as I need to hold the whole uncompressed data in memory before compressing.

- while reading, the MessageBobyStream received from the WCF interface is "forward-only" and the ZipFile.Read(stream) function is using the Seek() method which throws an UnsupportedException.

Unfortunately, your new AddEntry method with a delegate still does not fit with my needs, although it's great and a much better solution than the threaded StreamAdapter !.

Cheers :-)

 

Coordinator
Oct 5, 2009 at 3:15 PM

ok, that clarifies things.

Why can you not save the zip file to your output stream?   you have a stream, or a chain of streams, right?  And you write to the outer-most stream., yes? why can you not just call ZipFile.Save(stream) ??

I don't really see the utility in sending a single-file zip archive.  A deflated stream makes sense to me.   Why would you want a single-file zip archive?  It seems to offer no advantage over the DeflateStream approach.

 

Oct 5, 2009 at 3:49 PM

The idea behind the single-zip archive thing is to be able to use the password protection with AES encryption to enforce the security the XML data stored on the client side before it is sent to the server. This is not directly related with the security of WCF services but I've got to deal with it at the other end as it is sent "as is".

About the ZipFile.Save(stream), it would mean that depending of the type of the source stream, I will have to pass the ZipFile object to my business layer that will have to know about it and whether it has to be called or not when it finishes its work. I would prefer that the business layer to deal only with raw XML data as a pivot, being unaware of encryption, authentication and protocols logic and details.

 

 

Coordinator
Oct 5, 2009 at 5:04 PM

Ah, things are a little clearer.  What you want is a stream that does Aes encryption.  The ZipFile is not suitable for that purpose.

> to enforce the security the XML data stored on the client side before it is sent to the server. This is not directly related with the security of WCF services but I've got to deal with it at the other end as it is sent "as is".

I don't understand what this means.  In WCF there is the possibility to encrypt the transmitted data on the sending side, and decrypt on the receiving side.  Why not use the builtin capability?  

> I would prefer that the business layer to deal only with raw XML data as a pivot, being unaware of encryption, authentication and protocols logic and details.

But if you use the ZipOutputStream you provided above, your logic must call PutNextEntry() when writing to the zipOutputStream, right?  And you must set the Password and the Encryption properties on the ZipFile.  So is there knowledge of the ZipOutputStream in your code?  There must be.

 

Coordinator
Oct 5, 2009 at 7:06 PM

boutblock, I think you want to do something like this:

using (var input = File.Open(_inputFileName, FileMode.Open ))
{
    using (var raw = File.Open(_outputFileName, FileMode.Create, FileAccess.ReadWrite ))
    {
        using (var output= new DotNetZipOneEntryOutputStream(raw, _inputFileName))
        {
            output.Password = _password;

            byte[] buffer= new byte[2048];
            int n;
            while ((n= input.Read(buffer,0,buffer.Length)) > 0)
            {
                output.Write(buffer,0,n);
            }
        }
    }
}

is that right?  if so, here's a class that will help you.  It depends on the new feature added in v1.9.0.12.  There's no MemoryStream used to buffer the writes. Instead it uses a background thread to do the save.

    /// <summary>
    /// DotNetZip does not provide a ZipOutputStream, like SharpZipLib.
    /// This wrapper class provides one, which supports a single entry only.
    /// </summary>
    public class DotNetZipOneEntryOutputStream : Stream
    {
        private String _password;
        private String _entryName;
        private Stream _outputStream;
        private Stream _entryStream;
        private ZipEntry _zipEntry;
        private bool _closed;
        private System.Threading.ManualResetEvent _hasClosed;
        private System.Threading.ManualResetEvent _streamIsSet;
        private System.Threading.ManualResetEvent _doneSaving;

        /// <summary>
        /// Create a DotNetZipOneEntryOutputStream.
        /// </summary>
        ///
        /// <remarks>
        /// The resulting zip file will contain just one entry, with the specified
        /// name.  
        /// </remarks>
        ///
        /// <param name="s">
        /// The stream to wrap.
        /// </param>
        ///
        /// <param name="entryName">
        /// The name of the entry to create in the zip file.
        /// </param>
        public DotNetZipOneEntryOutputStream(Stream s, string entryName)
            : this (s, entryName, null) { }

        
        /// <summary>
        /// Create a password-protected DotNetZipOneEntryOutputStream.
        /// </summary>
        ///
        /// <remarks>
        /// This class uses WinZipAes256 encryption when the password is
        /// non-null. The resulting zip file will contain just one
        /// entry, with the specified name, and encrypted using AES
        /// using the specified password.
        /// </remarks>
        ///
        /// <param name="s">
        /// The stream to wrap.
        /// </param>
        ///
        /// <param name="entryName">
        /// The name of the entry to create in the zip file.
        /// </param>
        ///
        /// <param name="password">
        /// The password to use to encrypt the entry in the zip file.
        /// </param>
        public DotNetZipOneEntryOutputStream(Stream s, string entryName, string password)
        {
            _hasClosed = new System.Threading.ManualResetEvent(false);
            _streamIsSet = new System.Threading.ManualResetEvent(false);
            _doneSaving = new System.Threading.ManualResetEvent(false);

            _entryName = entryName;
            _outputStream = s;
            _password = password;
            
            ThreadPool.QueueUserWorkItem(new WaitCallback(SaveWhenDoneWriting), null);
            _streamIsSet.WaitOne();
        }

        /// <summary>
        /// The password to use on the entry in the zip file.
        /// </summary>
        public string Password
        {
            set
            {
                _password = value; 
                _zipEntry.Password = _password;
                _zipEntry.Encryption = (_password == null)
                    ? EncryptionAlgorithm.None
                    : EncryptionAlgorithm.WinZipAes256;
            }
        }

        /// <summary>
        /// The name of the entry in the zip file.
        /// </summary>
        public string EntryName
        {
            get 
            {
                return _entryName;
            }
            set
            {
                if (value == null)
                    throw new ArgumentException("EntryName");
                _entryName = value;
                _zipEntry.FileName = _entryName;
            }
        }

        
        private void SaveWhenDoneWriting(object o)
        {
            using (var zipFile = new ZipFile())
            {
                _zipEntry = zipFile.AddEntry(EntryName, (name, stream) =>
                    {
                        _entryStream = stream;
                        _streamIsSet.Set();
                        _hasClosed.WaitOne();
                    }
                                             );
                if (_password != null)
                {
                    _zipEntry.Password = _password;
                    _zipEntry.Encryption = EncryptionAlgorithm.WinZipAes256;
                }
                
                zipFile.Save(_outputStream);
            }
            _doneSaving.Set();
        }

        /// <summary>
        /// Write the data from the buffer to the stream.
        /// </summary>
        public override void Write(byte[] buffer, int offset, int count)
        {
            if (_closed) throw new System.InvalidOperationException();
            _entryStream.Write(buffer,offset,count);
        }

        
        /// <summary>
        /// Close the stream. 
        /// </summary>
        public override void Close()
        {
            if (_closed) return;
            _hasClosed.Set();
            _closed= true;
            _doneSaving.WaitOne();
        }


        public override bool CanRead  { get { return false; } }
        public override bool CanSeek  { get { return false; } }
        public override bool CanWrite { get { return true; } }
        public override long Length   { get { throw new NotSupportedException(); }}

        public override long Position
        {
            get { throw new NotSupportedException();}
            set { throw new NotSupportedException();}
        }

        public override void Flush() { }

        public override int Read(byte[] buffer, int offset, int count)
        {
            throw new NotSupportedException("Read");
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotSupportedException("Seek");
        }

        public override void SetLength(long value)
        {
            throw new NotSupportedException();
        }
    }


Oct 5, 2009 at 9:45 PM

the XML data are persisted  on the client side as zip archives to handle the case when the server is unavailable.

Therefore, the password and encryption for security reasons. WCF adds its own security over the streams exchanged.

And yes, I have to set the password, NextEntry and so on, but I do that during initialization of the chain of streams.

My business layer only accept an open abstract Stream as a parameter (as unfortunately IStream does not exist in the framework...).

I will try your new DotNetZipOneEntryOutputStream as soon as ... I ll be back to work !

Don't you ever sleep or at least have a quick nap !! lol


Coordinator
Oct 6, 2009 at 5:15 AM

No, I don't sleep!!

I put a ZipOutputStream into DotNetZip - try it out. You need v1.9.0.13.

Oct 6, 2009 at 9:05 AM

Ok, I'm gonna try it now, thx a lot.

Btw, would be abusing to ask for a ZipInputStream that deals with forward-only streams ? ;-)

Oct 6, 2009 at 9:52 AM
Edited Oct 6, 2009 at 1:48 PM

Great ! I've replaced my own implementation based on a memory stream and it's working perfectly :-)

Now, If there is a way to workaround the forward-only input stream issue without using a temporary memory stream as I did, this would allow no limit in size of data exchanged by using streams in both directions of my WCF interface :-)

PS : my current version of Winrar 2.62 cannot open the zip archive with 256 bits AES encryption enabled (but hey, winrar is made for RAR not ZIP format ;-). However 7Zip 4.65 does perfectly.

!   H:\v_xps_ifsv2\TPI\XmlFileTransfer.WCF.Net\ClientTest\bin\Debug\Download\FinancialTransactions_20080101_20110101.xml.zip: Unknown method in FinancialTransactions_20080101_20110101.xml
!   H:\v_xps_ifsv2\TPI\XmlFileTransfer.WCF.Net\ClientTest\bin\Debug\Download\FinancialTransactions_20080101_20110101.xml.zip: No files to extrac

 

 

Coordinator
Oct 7, 2009 at 4:13 AM
boutblock wrote:

Now, If there is a way to workaround the forward-only input stream issue without using a temporary memory stream as I did, this would allow no limit in size of data exchanged by using streams in both directions of my WCF interface :-)

!   H:\v_xps_ifsv2\TPI\XmlFileTransfer.WCF.Net\ClientTest\bin\Debug\Download\FinancialTransactions_20080101_20110101.xml.zip: No files to extrac

Are you referring to the fact that it is not possible to use PKZIP encryption when using a ZipOutputStream?   If you are...

Is this a big problem?

I don't know if you read the doc I wrote on this topic.  I tried to explain it there.  In short, the reason PKZIP encryption doesn't work with a ZipOutputStream is that the initial Encryption "seed" depends on the CRC of the data to be encrypted.  And of course the library can only learn the CRC after all the data has been streamed.  

The WinZip AES encryption does not have this problem.

I thought about how to get around this problem.  The library could cache all the data that is streamed, then compute the CRC, then write the data through the encryptor.  This will work for small data sizes, not for large ones.  You'll get a memory error.  I think you saw that.

The other option is to invent a programming model where the application must stream the data twice.  First to compute the CRC, then again to actually perform the encryption.  This would be really ugly, I think.  It would break the feel of the "stream" interface. 

If you are talking about something else, please explain.

-Cheeso

Coordinator
Oct 7, 2009 at 4:15 AM

oh, wait, I think you are asking for a ZipInputStream, is that right?

Yeah, I figured that would be the next step.   I'll see what I can do.

 

Coordinator
Oct 7, 2009 at 4:16 AM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Oct 7, 2009 at 9:15 AM

Yeah, that's what I figured out about the legacy PKZIP encryption, that it wouldn't be compatible with a forward-only writing stream.

But that's fine with me as the AES Encryption was what I needed in the first place.

I don't think you should bother with this old and weak PKZIP encryption while dealing with streams anyway.

 

Regarding the ZipInputStream, well yes, that would make DotNetZip the most complete Zip library for .NET !

And I already have to thank you a lot for your responsiveness. You beat most of the commercial supports out there !