Manipulating the "Extra fields"?

Apr 13, 2010 at 3:25 PM
Edited Apr 14, 2010 at 12:49 AM

Hi. I'd like to know if it's possible to and how to manipulate the extra fields on zip entries. Specifically, I need to make one have a length of 0.

The situation:

I'm writing an ePub creation tool for fun and personal use, and using DotNetZip to make the zip container. When I use the "epubcheck" validation tool, it always returns an error like "extra field length for first filename must be 0, but was 36". This 'first file' is some special file called "mimetype" containing the mime-type data for epub. The code that (likely) produces this error message in the epubcheck tool is (particularly important parts in bold):

 

FileInputStream epubIn = new FileInputStream(epubFile);

byte[] header = new byte[58];

if (epubIn.read(header) != header.length) {
report.error(null, 0, "cannot read header");
} else {

int fnsize = getIntFromBytes(header, 26);
int extsize = getIntFromBytes(header, 28);

if (header[0] != 'P' && header[1] != 'K') {
report.error(null, 0, "corrupted ZIP header");
} else if (fnsize != 8) {
report.error(null, 0, "length of first filename in archive must be 8, but was " + fnsize);

} else if (extsize != 0) {
report.error(null, 0,"extra field length for first filename must be 0, but was " + extsize);

} else if (!CheckUtil.checkString(header, 30, "mimetype")) {
report.error(null, 0, "mimetype entry missing or not the first in archive");

} else if (!CheckUtil.checkString(header, 38, "application/epub+zip")) {
report.error(null, 0, "mimetype contains wrong type (application/epub+zip expected)");

}
}

 

The code I use to write the zip file is:

ZipFile output = new ZipFile(Encoding.UTF8); //use utf-8 file names
output.UseUnicodeAsNecessary = true; //ensure the file names use unicode names?
output.Encryption = EncryptionAlgorithm.None; //the file cannot be encrypted
output.CompressionLevel = CompressionLevel.None; //the file cannot be compressed
output.AddEntry("mimetype", data.Mimetype, Encoding.ASCII); //file in ascii(ansi) encoding

Any hits or suggestions? Thank you.

 

Coordinator
Apr 13, 2010 at 5:11 PM

yes, you can work around this.

The "extra field" in a zip entry contains high-resolution timestamp information, potentially encryption information, other stuff.  

By default DotNetZip produces zip files that include NTFS timestamp information in that extra field.  The size of that time information is 36 bytes: a 4 byte signature followed by 12 bytes each for the created, modified, and accessed times for the file.  To exclude that timestamp you can set ZipFile.EmitTimesInWindowsFormatWhenSaving to false, in the same way (and right after) you set UseUnicodeAsNecessary to true. You will still get the low-resolution timestamp on the zip entry.  This should give you an "extra field" of zero bytes in length.

ps: it seems unnecessarily rigid for ePub to require that the extra field length for the entry be zero.  The extra field is optional, it need not be handled by the reading application.  If the app doesn't want or need the extra field, it can ignore it.  No need to report an error. 

 

 

 

Apr 14, 2010 at 12:46 AM

Thank you so much! I can finally output a "valid" (according to epubcheck) epub.

Cheeso wrote:

ps: it seems unnecessarily rigid for ePub to require that the extra field length for the entry be zero.  The extra field is optional, it need not be handled by the reading application.  If the app doesn't want or need the extra field, it can ignore it.  No need to report an error. 

I agree that the requirements for the frankly useless "mimetype" file are overly strict and a bit difficult to follow. I'd imagine most reading systems skip it anyway. My Sony Reader read my "invalid" epub just fine. But... since you can never be sure who will enforce what, we have to try to pass the only 'validation' tool available for epub.

Coordinator
Apr 21, 2010 at 5:14 PM

Glad it worked for you!

 

Aug 27, 2010 at 4:46 PM

My thanks too!

For what it's worth, the point behind this restriction is that the mime type is at an exact binary offest from the start of the file.  This makes it easier for low powered devices to check what sort of file it is and so on.  However, I do agree that this seems overkill these days.

Now for the rest of the errors :(

Iain