Simple Browsing of ZIP Entries

Apr 4, 2010 at 12:17 PM

Hi there

Given ZIP files are inherently hiearchical, I was wondering how to easily browse the contents of a ZIP file (e.g. getting a directory's child directories/files, or getting an entry's parent. From a first glance, it appears to me that no such functionality is present, but I might overlooked it. In case DNZ doesn't support browsing, are there any things to consider given the following strategy:

  • Iterate througn all entries
  • Create the hiearchy in an independent data structury (probably a tree of some sorts) based on path information
  • Recreate the tree whenever the ZIP file's contents have been changed

Cheers

Philipp

Coordinator
Apr 5, 2010 at 8:44 PM

Hi Philipp,

It's not the case that zip files are inherently hierarchical.  Filesystems are, but zip files definitely are not!  

Reflecting that, there is no support within the ZipFile type in DotNetZip for navigating that hierarchy.   On the other hand, you could implement this fairly easily with the FileSelector class, or with a LINQ query on the FileName property of the ZipEntry instances contained within a ZipFile.

As for creating a hierarchy and re-creating that tree when the zip file's contents have been changed - I don't know exactly what you mean.  There is a way to produce a tree.  One of the example applications in the source distribution is a C#/WinForms app that displays the contents of the zip file in a TreeView.  http://dotnetzip.codeplex.com/SourceControl/changeset/view/57130#1041966 

As for updating the tree when the content of the zipfile changes - I suppose you'd have to just re-create the tree. As I described above, there's no state retained in the ZipFile class that models the hierarchy you described.  You'd have to maintain that yourself.  There is also no "ZipFile.Changed"  event, to notify you that a ZipFile has changed.

I hope this helps...

 

 

Apr 5, 2010 at 9:05 PM

Dino,

I ended up building a simple repository of linked nodes (every node has one possible parent, and a child node collection) which I created based on the entries' file paths, pretty similar to your sample. As my provider manages write access to the ZIP file, it just invoked a Refresh method on the repository as soon as something is being changed.

However, I haven't investigated that yet: Lets assume you have a huge ZIP file (e.g. several GB) which just contains a few directories and big files - how big is the cost of getting the entries off a ZipFile instance in terms of file I/O?

Thanks for your advice

Philipp

Coordinator
Apr 6, 2010 at 12:34 PM

The zip directory is normally stored at the end of the zip file.  For a large file, let's say 1gb, DotNetZip will seek to "64 bytes before the end of the file," and look for the directory marker.  Then seek to the directory, which usually immediately precedes that market.  Then read the directory.  There's one record in the directory for each zip entry in the file.  Typically each record in the directory is between 40 and 70 bytes depending on the filename length and other things.  If there are 8 files, then figure about 400 bytes of I/O to read the directory and get the ZipEntries.  It's pretty fast if your media supports fast seeking.

But try it and you'll see.

 

Apr 7, 2010 at 7:21 AM

I'm more or less thinking about scenarios that may involve streaming ZIP files from a remote location. But given that a non-seekeable stream needs to be fully read in that case, I guess I need to expose the meta data differently.

Thanks for the insight Dino :)

Coordinator
Apr 9, 2010 at 1:10 PM

You can design streams that seek over Http and FTP.   See, for example

http://cheeso.members.winisp.net/srcview.aspx?dir=streams

 

Apr 10, 2010 at 9:11 AM

Nice one, Dino. Didn't know about the Range option :)