ZIP Entry Aliasing?

Oct 21, 2010 at 4:00 PM

I'm in the unusual situation of wanting "hard links" within a ZIP file (to use Unix terminology).

Specifically, I have a set of files that will be identical:

lib/armeabi/libmscorlib.dll.so

lib/armeabi-v7a/libmscorlib.dll.so

lib/x86/libmscorlib.dll.so

I would like to be able to store these files only once so that the .zip file doesn't explode in size. A cursory reading of the ZIP spec implies that this might be possible by having the the file entry in the directory file header for each filename refer to the same location in the .zip package, as each directory entry contains a "Relative offset of local file header" [0], so it should be possible to just compute this "relative offset" separately for each entry to refer to the same block of memory.

Thus, the question: is this really possible? If so, would it be possible to add this support to DotNetZip? I'd be willing to write & contribute this support, the difficulty is in understanding enough of the code to see if this is really possible and would be acceptable.

Thanks,

 - Jon

[0] http://en.wikipedia.org/wiki/ZIP_(file_format)

Coordinator
Oct 23, 2010 at 10:09 PM

Theoretically, it should be possible, EXCEPT, that there is redundant storage of the filename for each entry.  The filename of the entry is stored in the directory entry (maybe 80 bytes max for each entry, typically stored at the end of the zip file), as well as in the zipentry data, which is found at the "Relative Offset of Local Header" that you referenced. 

DotNetZip does a consistency check for each entry, to verify that the filename in the "local header" section is the same as the filename in the "directory entry" section.  If this check fails, the zip is declared invalid or corrupt.  I'd guess that other zip tools and libraries do something similar.

An alternative for you might be to use a compressed tar archive, rather than a zip.  The tar format supports archival of links, and if you have the right "untar" logic, you should be able to unpack an archive that contains 3 entries that all refer to the same filesystem file.

I wrote a tar  library/utility for .NET.  It's available at http://cheeso.members.winisp.net/examples.aspx#Tar . I am trying to remember now if I implemented the support for hardlinks; I recall doing the research for how to programmatically create hardlinks and junctions from within .NET, and learned that it was not straightforward.  Just looked now in Tar.cs and I see I did not insert into the code the logic to handle inserting hardlinks or symlinks (junctions) into tar archives, or extracting same.  Sorry :<   You should be able to modify the code to do what you want if you're fairly apt at C#.  I could do this for you, too, for a fee.

The other option is to use the zip file and have a post-extract step, that would massage the extract filesystem, to insert links where you want them. Obviously this is not as clean as supporting hardlinks in the archive format directly.

Good luck!