Where's ZipFile.Read(byte[]) in v1.9.1.6?

Jul 26, 2011 at 1:39 AM

I've updated to 1.9.1.6 to take advantage of some of the fixes but have found ZipFile.Read(byte[]) is gone. Does this mean I have to open my byte[] with a MemoryStream and manage the stream's cleanup or is reading a byte[] something you have yet to put back into 1.9.1.6?

Thanks in advance.

Coordinator
Jul 26, 2011 at 10:55 PM

You do not need to "manage the cleanup" of a MemoryStream - just stop using it, it will go away.

like this:

using (var zip = ZipFile.Read(new MemoryStream(myArray)))
{
   // use zip here
}

The MemoryStream is IDisposable but there are no OS resources held by the thing, which means you don't need to worry about Dispose() and finalization etc.  Just let the garbage collector do its normal thing, you're good.

 

Jul 29, 2011 at 2:29 AM

Thanks Cheeso, though I've noticed something strange. When I read a zip from a memorystream and then destroy the ZipFile it appears to be leaving a chunk of memory about the size of the decompressed files in use. If I write the memorystream to a zip file on the hdd, read the zip from the hdd then destroy the ZipFile the memory returns to the original size. I'm calling the following to make sure the GC has cleaned up the memory as much as possible, i've even tried calling it multiple times but it doesn't make any difference to calling it once. The only thing I can think of is that the ZipFile class, or related, is leaving a stream or something open somewhere when reading from a stream as it works fine when reading from the hdd.

GC.GetTotalMemory(true);
GC.WaitForPendingFinalizers();

I'm experiencing a similar problem when extracting to memory then destroying the memoryobjects and the ZipFile. I have yet to test the difference when extracting to the hdd.

Coordinator
Jul 29, 2011 at 6:20 AM

I think that you are assuming some behavior by the GC , without a good basis. 

I am not a GC expert, but I believe the GC does not guarantee that it returns memory to the pool immediately.  I also think that manually invoking GC methods is bad form - it's not something you want your application to get involved with. Debugging the GC from the application perspective is not a good idea.

If you believe there is a memory leak, you should run your app for a long time, with repeated cycles, and never ever call GC.Anything().  Monitor the memory in the continuous operation condition.   This is how you can determine if GV is working properly.  If you see steady memory growth over time (over an hour or more of running) then you have shown likely evidence of a leak. If the memory usage does not grow, but remains at a fixed level, then there is no leak.

It's quite possible that there is a bug in DotNetZip regarding handling of streams. It's also possible there's a bug in your code. It's more likely, in my estimation, that there is a bug in your reasoning and assumptions.

 

Jul 29, 2011 at 6:28 AM

Ok, I'll set up a loop without and GC calls to test it out, hopefully next week. For the moment i've change it to work with files on the hdd rather than streams.

Sorry if i came across a bit brash, it wasn't my intention, I really do like the DotNetZip libraries quite a lot. We use to use Dynazip and that was painful. Your response time is fantastic, thanks again.

Coordinator
Jul 29, 2011 at 3:45 PM

no worries on the potential bug report.  I didnt' take it personally and if my response gave you the impression I was put off by your message, it wasn't my intention.

glad you like the library.  Worth every penny, eh?    

Just to elaborate a little on what I said - it's possible for the GC to retain internal tables of memory blocks that are candidates for clean up, but if there is no memory pressure, in other words if there is not a steady new stream of allocations, then the GC may simply decide to not clean up at all.  It keeps the "to do" list of cleanups but never acts on it until necessary. There's a great deal of engineering time that's gone into the GC, well beyond my understanding, and it's designed to "just run".  So when you call GC.Something(), your app is sort of intervening into an area of responsibility that is really outside its own scope.  I think the best advice is to just let the GC do it's thing, and trust that it works.   If you accept that advice, then the way to demonstrate or measure a memory leak is to show growth over time in a steady-state app:  an app that makes new allocations (instantiations) and allows variables to go out of scope, at the same steady rate.  That rate could be quite high, but it's got to be balanced, over time, and it's got to be *over a good deal of time*.  Given that, attempting to judge the performance of the GC, or to conclude that a leak is occurring, based on a handful of allocations or a handful of method invocations, is not a valid approach.