thread safety question

Jan 15, 2010 at 2:53 PM

Is it possible to have different threads calling ZipEntry.OpenReader and reading the data without synchronization?

My fear is that the stream for accessing the ZipFile would be lost with interleaved Seek/Read from different threads.

My question is also about the Extract operations.

Coordinator
Jan 16, 2010 at 4:22 PM

ZipFile and DotNetZip is not multi-thread safe.  If you want to have multiple threads using a zip file, open multiple instances of ZipFile.

Jan 18, 2010 at 9:14 AM

could there be a way to avoid reparsing the Zip and rebuilding the ZipFile object?

The assumption would be that a file is opened read-only, so no changes could be done to it, but with a share option with other readers.

Somting like ZipEntry.OpenReaderOnOwnStream?

Coordinator
Jan 18, 2010 at 12:38 PM

What problem are we trying to solve?

I think it's reasonable to request that DotNetZip classes be modified to be thread safe. It's reasonable to ask, but unreasonable to expect that I will do the work.  Most use of DNZ is single-threaded, and changes to support multiple threads would be far-reaching and not performance neutral.  I don't feel like there's a good payback there, so I wouldn't do that work.  Also, the workaround is simple: open multiple instances of the ZipFile. 

On the other hand I'm not sure what you're asking for here.  What problem are you trying to solve?  

based on my understanding, ZipEntry.OpenReaderOnOwnStream() would be a half-way step to multi-thread support.  IT would be multi-thread support, but only for reading entries.  The special-case nature of it seems sort of inelegant, from the start.

Secondly, It's not so easy to build this.  Opening a reader is simple.  But once such a reading stream is dispensed, the zipfile could not be updated at all, if there were any outstanding "other thread" readers.  The library would need a mechanism to release that lock.  This makes the API considerably more complicated to support the scenario of multiple threads. The internal implementation is much more complicated. All of this is hard to get right, hard to maintain, and hard to explain to the large majority of users who just don't care about 2nd-thread readers. 

To me, you identified the base issue in your original question: The filestream pointer. In fact the situation with a ZipFile is exactly analogous to the situation with the FileStream type.  It's a fairly simple object.  and it is not thread safe.  To read or write a file from multiple threads, you must open multiple FileStream instances. The same applies to a zip file.

 

Jan 19, 2010 at 10:41 AM

You are true, FileStream is mostly unusable with multiple threads, except perhaps to log data in a trace file.
But the OS supplies options for sharing files, e.g. by opening a file for read only and allowing share with readers.
I am evaluating a scenario for extracting data from a zip file on the fly, when a thread need access to a given entry, without extracting all entries in advance (most entries being not accessed by the application, and each thread responsible of the CPU time for deflating). It would be like access to a database where you make read requests with where clauses, accessing only a small part of the whole data. For sure, it has to be "constant data" while the application is running (e.g. it would first download the zip in a temp file from a server).

The ZipFile would be hidden in a specific class, the only allowed operations being read only operations, and getting a (deflated) Stream for an entry.
I would have to create a new FileStream for the zip file, and find a way to hack the reader from OpenReader for using the new FileStream instead of the one owned by ZipFile (and dispose the FileStream when the reader is close/disposed). The ZipFile would be instanciated by the new class constructor, so no multi threading problems could occur during this time.

I am using files that have an index.xml entry; I could expose specific methods to access their content.

 

Coordinator
Jan 19, 2010 at 1:15 PM

It sounds to me like you have a special case; you might want to hack up a special class, or a custom modification of DotNetZip to handle this.

I don't feel like your case is a mainstream on, as I explained above.

 

Jan 19, 2010 at 2:04 PM

I agree. But parrallelism in ExtractAll would be nice (perhaps)

Jan 19, 2010 at 2:19 PM

I just found that in .Net 4.0, there will be a ThreadLocal<T> class.
so it will be possible to have something like a field = new ThreadLocal(()=> New FileStream(filename,...));
and a property returning the field Value property.

The file would be automatically reopened for each thread!

Coordinator
Jan 19, 2010 at 3:10 PM

You can already do threadlocal storage in .NET. 

http://msdn.microsoft.com/en-us/library/system.threading.thread.allocatenameddataslot.aspx

The new class will make it easier, but you can already do this today.

Jan 20, 2010 at 7:14 AM

I missed this information about the FileStream class (since version 2.0, it is not present in the 1.1 version of this page: http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx

Detection of stream position changes

When a FileStream object does not have an exclusive hold on its handle, another thread could access the file handle concurrently and change the position of the operating system's file pointer that is associated with the file handle. In this case, the cached position in the FileStream object and the cached data in the buffer could be compromised. The FileStream object routinely performs checks on methods that access the cached buffer to assure that the operating system's handle position is the same as the cached position used by the FileStream object.

If an unexpected change in the handle position is detected in a call to the Read method, the .NET Framework discards the contents of the buffer and reads the stream from the file again. This can affect performance, depending on the size of the file and any other processes that could affect the position of the file stream.

If an unexpected change in the handle position is detected in a call to the Write method, the contents of the buffer are discarded and an IOException is thrown.

A FileStream object will not have an exclusive hold on its handle when either the SafeFileHandle property is accessed to expose the handle or the FileStream object is given the SafeFileHandle property in its constructor.