Empty directories

Feb 11, 2010 at 7:30 PM

Looks like ZipIt doesn't add empty directories in version 1.9.

I run ZipIt 1.7 and it does.

I run Zipit 1.8 and 1.9 and it doesn't.

I'm curious if that was intentional or if I am misunderstanding a command line parameter.

- Mark Holt

Feb 11, 2010 at 8:41 PM

I didn't understood if the PKZIP specification allowed explicit delaration of directories. I will now wait to know.

Coordinator
Feb 11, 2010 at 11:15 PM

Mark - between 1.7 and 1.8, There was a change in behavior regarding directories that was an unintentional but expected side-effect of rationalizing the way directories were added and stored.  On the other hand, even with this change, it seems correct that empty directories should actually be stored in the zip.   I just tried this with v1.9 and found what you reported: an empty directory that was present in the filesystem was not stored in the zip.   I think this is broken. 

Gilles - in response to your side question, the PKZIP spec allows entries in the ZIP file that are directories.  But, the directory entry is not a container object.  There is no relationship in the zip data format between a directory entry named "Foo/" and a file entry named "Foo/File1.txt".    It is possible to have a file entry by that name, whether or not the corresponding directory entry is included in the zip. 

 

Coordinator
Feb 11, 2010 at 11:48 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Coordinator
Feb 12, 2010 at 12:11 AM
Edited Feb 12, 2010 at 2:14 AM

Suppose the current directory contains a directory hierarchy that looks like this:

  markus\
     empty\
     other\
        one.txt
        two.txt

And suppose within the current directory, I run this command :

  zipit.exe M.zip -r+  markus 

Using DotNetZip v1.9, The tool interprets the name "markus" to be a directory name, because a directory by that name exists in the current directory. The tool then expands that into a file selector criterion, formally "name = .\markus\*" . Translated, this means "select any filesystem object with a name that begins with .\markus and ends with anything."

The current behavior is that the created zip contains 2 entries only - one for each of the text files. In other words, the tool interprets "filesystem object" to be "file".

As for the expected behavior:...I think the created zip should contain entries for the 2 directories. In other words, "filesystem object" should mean "file or directory."

Mark, do you agree? Does that make sense for you?   I think you will agree; I don't think this is too controversial.

A more interesting question comes up when the file selection criteria is more complicated than a simple directory name. The rest of this post supposes that you're familiar with the selection criteria used by zipit (and by the DotNetZip library in general).

Suppose the criterion is, "size > 100kb". This will be true for some files, but will never be true for directories. Likewise for file attributes (hidden, system, readonly, archve). Would that behavior make sense?   Using the behavior I am proposing here, with the criteria string of "size > 100kb", the created zip file would have entries in the zip for files above 100k in size, but would have no entries for any directories at all, even if those directories in the filesystem contained files that were above 100k in size.  The files themselves would be included, but the created zip would contain no entries for their parent directories.  Would that behavior make sense to you?

An interesting twist occurs for the timestamp criteria (ctime, atime, mtime).  An example might be "mtime < 2010-01-20", meaning, select any filesystem object with a modified time before midnight on 20 January 2010.   Directories have all these timestamps, as do files.  Should inclusion of files be done independently of inclusion of directories?  In other words, a directory might have a last modified time (mtime) of 22 January 2010, and would therefore fall outside the criterion.  but a file contained within the directory might have a mtime of 17 January 2010, clearly within the criterion.  The logical behavior would be that the file is selected, but the parent directory is not.   Now, if both the directory and the file fall within the criterion, both would be included in the zip file.  (for example if the criterion was "mtime < 2010-01-24").   The weird part is that extracting the zip created from either of these situations would produce the same result in the filesystem. 

Thinking about it further, it would make sense to add a way to explicitly select files or directories.  The current set of types of criteria includes: filename, size, ctime/atime/mtime, attributes.  For the file attributes, the possible properties are:  hidden, readonly, system, archive, and indexed (indexed for search).  It would make sense to add "directory" and "file" as additional values for the attributes.  So if I wanted to explicitly select only files, I could use "attr = F" while if I wanted to explicitly select only directories, "attr = D".  Following my proposal here, using a criterion string of "attr = D", you'd get a zip that contained only a directory hierarchy, and no files at all. 

Before I implement all these changes, I'd like some feedback on this!

 

Feb 12, 2010 at 8:22 PM

An interesting concept to hash out.

If you want to match PKZip 2.04g or WinZip functionality then you would include the directories.  If you want to end up with the smallest .zip file then you would want to skip the directories.  If you want to preserve and restore a folder structure (empty folders and all) then you would want to include the directories.  I would think the default action would be to include the directories (as you suggested).

Would you default selection criteria to "attr = F" if no "attr" is given?  I would assume so.

I like the "attr = F" and "attr = D" idea.

What would a directory "size" mean?  The total size of files in the directory including sub directories?  If you wanted to eliminate empty directories then you would say "(attr = D) and (size > 0)"?

 

Coordinator
Feb 12, 2010 at 11:15 PM

Thanks for the feedback. 

Yes, I propose to include the directories, which would be a change from the current behavior.

I looked into the "attrs = F"  and "attrs = D" and decided that it's the wrong "noun".  The attributes are a concrete bit of metadata on each filesystem file, and there are well-defined bits for things like hidden (H), system (S), readonly (R), and so on.  There are no bits in that structure for object type (File or Directory).  So I think extending the meaning of the attributes "noun", which already is supported in the selection criteria string, is potentially confusing.  I think something similar can be done with a new "noun", such as "type".  So instead of "attrs = F"  it would be "type = F".    The default for the "type" criteria would be "any".  In other words, if you don't specify "type = F" or "type = D" then any filesystem object would be selected - all files and directories.   If you don't want to include directories in the selection, then you'd specify "type = F". 

Regarding size, I propose to define the size for directories zero.  In other words, you wouldn't be able to select a directory based on the aggregate size of all of its files.  So if you were to provide a criteria string of "size > 100k and type = D" you would get NO matches, always. This is my proposal.  I could be convinced that it would be useful to do as you suggest.  Right now I'm not seeing a ton of utility in that.