Memory usage pattern question

Mar 23, 2010 at 7:54 PM
Edited Mar 23, 2010 at 8:31 PM

I am having a hard time getting a handle on the memory usage of a ZipFile instance, before and after save, specifically when using MemoryStream input streams exclusively.

My intention, if possible, is to stash a reference to a ZipFile instance in an asp.net session.

Can you give some guidance before I break out the profiler?

Coordinator
Mar 23, 2010 at 9:58 PM
Edited Mar 23, 2010 at 9:59 PM

First the ZipFile class is not serializable, so I think that means it cannot be put into Session state.  It is not a simple matter to make it serializable, as it stores and holds filestreams and cursor positions in files.  A System.IO.FileStream is not serializable, and therefore, any type that has a FileStream as a member is also not serializable.

If you want to store the MemoryStream into Session state, that's a different thing. If that's the case, then you just need to measure the length of the memory stream.

BUT, having said all that, what are you trying to do?  Why do you want a ZipFile in session state?  there may be a better way of accomplishing what you want.

 

Mar 24, 2010 at 1:40 AM
Edited Mar 24, 2010 at 1:44 AM

Well, not being serializable leads to my first issue, not being able to easily measure the size of the ZipFile instance before and after a save. Which is why I am going straight to the source before breaking out the big guns. 

But non-serializabe poses no issue for my intended purpose, as in-proc session does not require serialization and there does not seem to be a problem stuffing a ZipFile instance into Session. I haven't written a lot of tests against it yet, but have been able to put and get without obvious error.

I am building a web-based toolset and I have a sandboxed, session based file system already implemented using a ZipFile as the container.

Basically, I can allow uploads unrestricted in terms of file type or viral potential in that the files never actually hit the disk as a file. And upon session end, the zip is deleted.

This strategy works fairly well so far and I am not concerned with scale AT ALL and since Session access is gated, if I keep my handlers single threaded I don't have to compensate or code for concurrency. 

My desire to store the ZipFile instance in session is to keep the file open and the 'map' created by the entries in memory.

I can tolerate/justify up to a few MB per instance IF the ZipFile/ZipEntries can be caused to flush to disk, otherwise I will just have to settle with opening->reading->process request->saving the zip for each handler request.

This works, but the memory cost of keeping the live instance in Session would pay for itself many times over in performance if the size is manageable.

These are my thoughts.

Yours?

 

PS: what i do NOT want to do is store the MS in session, just the map presented by the ZipFile, if you get my drift.

Coordinator
Mar 24, 2010 at 5:44 PM

ok,,

let me tell you how the ZipFile is implemented.  First, a better name for this class might be ZipFileManager.  The ZipFile class does not faithfully model a zip file.  It does not keep and preserve all the compressed entries within the instance.  Instead, the ZipFile class keeps a list of entries, and their sources (filesystem file, memorystream, string, zipfile, etc), and then is able to create a zipfile with those entries, into a stream, or into the filesystem directly. 

If you call ZipFile.AddEntry()  three times, with three different sets of information, there is no zip file.  The zip file is created - that is to say, the compressed is performed and the compressed data stream is arranged and decorated in a way that is compliant with the ZIP specification - only upon calling ZipFile.Save().  If you call ZipFile.Save() and specify a MemoryStream, then after the Save completes, the MemoryStream contains the byte stream that represents a zip file.  At this point, the ZipFile instance contains.. only a list of entries. 

Here's another example: suppose you instantiate a ZipFile, then call AddEntry() twice, specifying 2 distinct streams.  These streams are intended to provide the uncompressed data stream, which , upon Save(), will be compressed and then stored in the final media (either stream or filesystem file).  The ZipFile instances keeps those streams open and un-read, unless and until you call ZipFile.Save().   If at some point between calling ZipFile.AddEntry() and ZipFile.Save(), you close those streams and dispose them, then any future call to ZipFile.Save() will fail, because the streams are unreadable.

Does that make sense?

From what I understand of your scenario, it seems like you need to provide a backing store for the zipfile.  either filesystem or memory stream. (You can also use a database stream, etc etc, but you get my point).