ZipOutputStream ► ParallelDeflateThreshold C#

Aug 2, 2011 at 8:59 PM

I encounted out of memory issue if ParallelDeflateThreshold set to 0 basically after every bytes write something left in memory heap and i can't clean it unless fully destroying ZipOutputStream, there is no such case if ParallelDeflateThreshold set to -1 and there is no memory leackage. Basically this is only visible on very large archives since small ones doen't reach memory limits. I was trying to review issue with Windows Resource Locationsation tool. Any help will be appreciated.

Coordinator
Aug 2, 2011 at 10:38 PM

What version of the library are you using?  What can you tell me about the machine on which you are running the application?

The code in prior versions of the library tries to create a number of buffers that is proportional to the number of cores on the machine.  This is sometimes ok, but sometimes it may create too many memory buffers and can result in an out-of-memory condition.

Regarding the phenomenon of "cleaning memory" only if you destroy ZipOutputStream, this is expected behavior.  It is not "leakage" if the memory is reclaimed when you destroy the ZipOutputStream instance. 

In DotNetZip version v1.9.1.6006, limiting the memory buffers used by parallel deflation is simpler.  You may want to try that and see if it works for you. Be sure to read the documentation on the ParallelDeflateOutputStream. 

If that does not work for you, and if you have a small test case that reproduces the problem reliably, I would like to see it.

 

Aug 6, 2011 at 6:15 PM

DotNetZipLib-DevKit-v1.9.1.5

4 core AMD 3Gh, ~ 40 Mb/s HDD, 4 GB Ram. I’m trying to build a queue in memory, each element has ~ 32000 bytes from Zip Input Stream and I’m scheduling to run Zip Output stream as soon as I have only ~700 Mb of Ram left, since reading is slower in my case and I don’t want to run out of queue elements too fast. And you are right I do destroy Output and input streams assigned to each zip file I have. Is there any indicator to specify when I could destroy output or input streams safely? Or windows scheduler does it on his own (I mean garbage collector cleaning). It’s hard to say if memory is reclaimed or not I was trying to pause application after it’s almost reached out of memory state and see if memory will be freed but it stays on same level.

 

    Is there any “optimal” buffer size to reach fastest ZipOutputStream write capabilities or it’s just limited by disk write speed!?

 

Thank you,

Coordinator
Aug 6, 2011 at 8:23 PM

ah, ok.

Well - high memory consumption of a .NET process does not indicate a leak.  The GC may just be deferring reclaiming of memory.  It's actually not hard to say if there is a leak.  A leak will be evident over time, exhibited as steady growth in memory usage in the process, as new/destroy is called repeatedly.  I suggest that read up on diagnosing .NET leaks if you like; but high memory usage, by itself, is not evidence of a leak. 

As for optimizing the write speed of a ZipOutputStream, there is no hard and fast rule. It depends on many things, including your machine configuration, other processes contending for resources, the design of your application, the throughput available in your I/O channel.  The only way to optimize is to test in your real-world scenario.  

I am not sure what you are attempting to accomplish with 'not running out of queue elements" and only scheduling the ZipOutputStream after a certain point.  My impression is that you are attempting to do too much within your application regarding the concurrency management and memory management.   These are jobs best left to the .NET CLR.  

You may want to look into QueueUserWorkItem - a .NET method that lets you run background tasks on the threadpool.  In .NET 4.0 you can get a better facility with Tasks.  Using those things means less application-level management of concurrency.  Instead you let .NET manage the concurrency for you. This is probably the best approach in most cases.  In .NET 4.0 you can pipeline Tasks, so that you need not worry whether input or output is faster or slower.  the Task scheduler balances things automatically.

As for this question:

> Is there any indicator to specify when I could destroy output or input streams safely?

I don't understand.  Destroy the object when you are finished with the object.   When it goes out of the using scope, it will be destroyed; let the GC handle memory reclamation.  You generally do not need to worry about it.