3

Closed

Allow a way to specify upper limit of # threads used for compressing

description

Cheeso,
 
I do love this library! It took very little time to get a utility up and running. By the way, is the DLL threadsafe? If I call t a dozen times from my own threads will it work correctly? I assume so.
 
While I appreciate the use of multiple threads, it actually becomes a problem when used on servers. I am currently using it on a SQL Server machine with quad cores, and it maxes out all four cores. Obviously this is ... an issue. I am about to upgrade this server to one of those 8 or 12 core AMD procs and I truly do NOT want DotNetZip grabbing all 12 cores for compression. If you think about it, moving from 2 to 3 cores gives you a 50% speed bump (all else equal). Moving from 7 to 8 cores gives you only a 12% speed bump. Much less noticible or useful
 
My suggestion would be to give us a property to set the max threads. Then I could decide how many cores I want used for the zip operation.
 
By the way, I spent the weekend zipping up 200 gigs of stuff using the utility we wrote (using DotNetZip) and for my stuff got about an 80% compression. LOTS of room saved. Further we integrated DotNetZip into a process I run that exports huge SQL Server tables to CSV and imports back in the processed CSVs. These table processes leave behind gigabytes of text files that I need to archive (can't just delete) and now I am getting about 80% compression on those as well.
 
Again though, Kudos on what you have done! Really cool!
Closed Jun 15, 2011 at 7:14 PM by Cheeso
this is fixed in changeset 79174. the first binary with this fix is v1.9.1.6.

comments

Cheeso wrote Aug 13, 2010 at 2:25 AM

I very much like the idea of allowing a way to specify a limit on the number of threads used for compression.
Any proposals on how this should be done?

Currently the number of threads spawned is linearly related to the number of cores on the machine. If you have 2 cores, 8 threads. If you have 4 cores, 16 threads, and so on. (I think 4x is the multiplier, but it may not be so). I could just make hardcoded limit - never go above 16 threads. Or, I could expose an explicit property, something like MaximumCompressionThreads.

Or it could be a combination.
suggestions welcomed.

Pointy wrote Dec 10, 2010 at 6:46 PM

If you want to control which processors DotNetZip runs on at the moment you can set the processor affinity from the calling process like this:

System.Diagnostics.Process.GetCurrentProcess().ProcessorAffinity = <bitmask>;
Ionic.Zip.ZipFile zipArchive = New Ionic.Zip.ZipFile();
zipArchive.AddFile("<somefile>");
zipArchive.Save("<someotherfile>");

where <bitmask> is a bitmask for the IDs of processors to run on (so, e.g. "5" = processors 1 and 3). That will affect anything else running in the same process though so caveat emptor.

The DotNetZip code that creates the worker threads could query this and create n workitems per active processor:

private void _InitializePoolOfWorkItems()
int affinity = System.Diagnostics.Process.GetCurrentProcess().ProcessorAffinity.ToInt32();
int processors = ParallelDeflateOutputStream.CountSetBitsKernighan(affinity);
int poolSize = processors * this.BuffersPerCore;
...
}
/// <summary>
/// Count the number of bits which have a 1 value.
/// From http://www-graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive.
/// </summary>
/// <param name="mask"></param>
/// <returns></returns>
private static int CountSetBitsKernighan(Int32 mask)
{
int count;
for (count = 0; mask != 0; count++)
{
    mask &= mask - 1; // this clears the LSB-most set bit
}
return count;
}

GetCurrentProcess needs full trust though, so I think just having a general MaxThreadPool property to replace ParallelDeflateThreshold would be simpler (and would allow parallel decompression, encryption, etc later without the need for more new properties). Setting this to 1 could use the non-parallel versions, and setting it to less than 1 would throw an OutOfRangeException.

Cheeso wrote Jun 14, 2011 at 1:09 AM

Thanks Pointy, I like the idea of a MaxZipThreads property. Also I agree that simpler is better. I'll do some further figuring on this and I hope to come up with something usable.

Cheeso wrote Jun 14, 2011 at 1:10 AM

I think one way to do it is for the ZipFile to set DefaultMaxZipThreads when it starts up - maybe it's a static private readonly field. And then each ZipFile instance can expose a public MaxZipThreads, which gets the default value in the ctor, but then it can be set to anything the application wants, subsequently.

Cheeso wrote Jun 15, 2011 at 7:12 PM

Ok, here's what I did. I exposed a new property MaxBufferPairs, on the ParallelDeflateOutputStream. This thing puts an upper limit on the number of buffer pairs that can be used by the parallel compressor; it also equivalently puts a limit on the number of background threads that will be used. The stream uses 4 buffer pairs per cpu core, by default, up to the limit specified by this property.

There is also a companion "pass-through" property on ZipFile and ZipOutputStream to do the same thing.

The ParallelDeflateThreshold remains, it has a meaning that is independent of the number of buffers.