Headerless GZip File/Stream

Aug 12, 2011 at 5:29 AM

Please excuse me if there is an obvious answer to this problem, but I've checked the documentation and discussion page here on CodePlex.

I have a headerless file containing only the result of a GZip compression.  It was produced by a very popular third party program to store data for its own use, and because it was not intended for use by any other application, the file itself is not a proper GZip archive - it has neither the .gz extension nor a header of any kind, only the binary data.  My application is designed to be one of many third party editors, and thus I must read and write data in the format expected by this application.  If I decompress the file using 7-Zip, I get exactly what I expect when looking at the result in a hex editor, based on the author's specification for the format of the data (that is, the format of the binary data that has been compressed).  The popularity of this application makes it easy to confirm that the binary data is GZipped, just as the author states.

I would use System.IO.GZipStream, as I only need to decompress binary data, but I have heard in numerous places that it is flawed and can only read or write to streams of a certain length.  I was hoping to use DotNetZip's Ionic.Zlib.GZipStream, but I get an exception of type Ionic.Zlib.ZlibException ("Bad GZIP header."), because of course this is not a GZip archive, only GZipped binary data.

Is there a way to compress and decompress a GZip stream without performing the file header validation?  I noticed from the stack trace that the validation is performed by a class called Ionic.Zlib.ZlibBaseStream, which sounds like what I might need, but I make it a practice not to dig around in code not part of an assembly's public interface :)

Any suggestions?

Coordinator
Aug 14, 2011 at 12:52 AM

Try ZlibStream or DeflateStream. 

They work like GZipStream but use different decorating metadata.

From your description , I don't know what sort of compressed file you have, but one of those streams might read it.

Aug 14, 2011 at 2:23 AM
Edited Aug 14, 2011 at 2:25 AM

Hold on... I think I must have made the mistake of thinking GZip is a compression algorithm, but it's an archive format, isn't it?

Coordinator
Aug 14, 2011 at 1:42 PM

GZip is a data format as described in RFC 1952.  Think of it as a compressed stream of bytes, surrounded by a header and trailer which describe those bytes, and provide a CRC for the uncompressed stream, respectively.

For compression, GZIP uses DEFLATE primarily, but other compression algorithms can theoretically be employed. 

If you have a raw compressed stream, you might want to try the DeflateStream. 

Aug 15, 2011 at 8:49 PM

It appears that my problems with the GZip header are entirely due to my own stupidity.  I was completely new to working with streams, and I was using a BinaryReader and thought it copied the stream and then read it.  I didn't even think of changing the position of the underlying stream before wrapping it in a GZipStream.

When I tried DeflateStream and it worked, I tried GZipStream again, and realized that I had already fixed my mistake because I have learned quite a bit about streams in the last couple days while working with decompressed data files.

Thanks for your responses, and keep up the good work!  GZipStream is great!