remote zip?

Jan 18, 2009 at 6:13 AM
there is one using SharpZipLib here: http://www.codeproject.com/KB/cs/remotezip.aspx

I was wondering whether it could be done with DotNetZip?
Coordinator
Jan 18, 2009 at 3:16 PM
Yes, it could be done.  In DotNetZip, the reading of a zipfile essentially does what the article describes, scanning to get the last bit of the zip file, finding the Central Directory, then scanning into the file to find the data and metadata for individual entries in the zip.  It would theoretically be possible to replace all the Seek()/Read() logic with HTTP RANGE requests.

Thinking about it, it feels like it would make the most sense to encapsulate the HTTP RANGE request behavior inside a subclass of System.IO.Stream. If you could successfully do that, you could totally compartmentalize the extension to DotNetZip to use a remote HTTP Server as the "source" for the zip archive.  Maybe it's called HttpRangeStream.
 
It would be a read-only stream, so every call to Write() on that stream would fail.  But Read() would send a HTTP GET with a RANGE header, where the value in the Range header was set by the calculated position in the stream.  Then, a  Seek() on that stream would change the position, so that the next Read() would result in a different Range header being sent.


Jan 19, 2009 at 1:37 AM
great thanks for your very logical and detail explanation. I will try if I could do that.    
Jan 21, 2009 at 9:51 AM
Edited Jan 21, 2009 at 10:15 AM
I spent a few minutes to implement the HTTPRangeStream, you could download it here:

http://files.cnblogs.com/unruledboy/DotNetZipRemoteTest.zip

the package only contains the HTTPRangeStream and a sample class, and I also added TotalReadTimes and TotalReadBytes for debug purpose only. hope you like it.

and, if you are interested, please, feel free to include it into your project :)

btw, your logical theory contributes 99% of the HTTPRangeStream :)
Coordinator
Jan 21, 2009 at 5:05 PM
Very nice!

I ran your code.  Very simple, and it works perfectly!  I noticed that it does something like 30 reads - 30 HTTP Requests - in order to just enumerate the entries in the zip file with just 6 or 7 entries.   Obviously that is not very efficient.  It would be nice to be able to cache some of the content to eliminate the need for so many requests; I'm sure it would make a big difference in performance.  But that would add additional complexity to your implementation.  

Beyond that, I think the HTTPRangeStream is of general interest - not only for users of the DotNetZip library. 
Consider publishing it as a standalone project or article?    Like maybe on www.codeproject.com .
It will get interest and re-use. 

If you decide to publish, consider getting it reviewed by a couple people you respect, before you publish it, to make sure you're producing something good.

Jan 22, 2009 at 1:58 AM
right, I noticed the performance, it will be very good for large zip files, but not for small files, originally I use only one request, but found that define "AddRange" more than once will not work, the Range header has multiple values, but only the first value is used, so I have to create request for each read. I will pay closer look to how to improve the performance later, and publish it to codeproject as RemoteZip does.