Reading zip-files starting not at the beginning of the stream

Apr 16, 2010 at 9:56 PM

The following code throws an exception when I write some data in front of the zip file. It works fine for start=0.

What am I doing wrong?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using Ionic.Zip;
using System.Diagnostics;

namespace ZipTest
{
	class Program
	{
		const int start = 64;
		static long size;
		static void Save()
		{
			using (FileStream f = File.Create("test.zip"))
			{
				f.Write(new byte[start], 0, start);
				ZipFile zip = new ZipFile(Encoding.UTF8);
				zip.AddEntry("a.txt", "abc");
				zip.Save(f);
				size = f.Length;
			}
		}

		static void Load()
		{
			using (FileStream f = File.OpenRead("test.zip"))
			{
				f.Seek(start, SeekOrigin.Begin);
				ZipFile zip = ZipFile.Read(f, Encoding.UTF8);//throws
				/*An unhandled exception of type 'Ionic.Zip.BadReadException' occurred in Ionic.Zip.Reduced.dll
				Additional information:   ZipEntry::ReadHeader(): Bad signature (0xCADDAD9C) at position  0x00000074*/
			}
		}

		static void Main(string[] args)
		{
			Save();
			Load();
		}
	}
}

Apr 19, 2010 at 7:40 AM
Edited Apr 19, 2010 at 7:48 AM

Edit: Sorry, didn't see that you were also starting to write at that position - my original anwer was wrong. Appears it's either how the stream is written or parsed.

 

 

Apr 19, 2010 at 5:04 PM
Edited Apr 19, 2010 at 5:06 PM

Hi MasterOfChaos,

I think the problem is to do with the way ZipFile calculates offsets to data blocks within the zip stream. Some of it is using absolute file positions including the padding, and other parts are relative positions from the start of the zip data - they don't seem to quite marry up between the read and write operations.

You could work around this by zipping the data to a temporary file (or memory) and then embedding the output from that into your file stream. This will force all offsets to be relative to the start of the zip data and you can do the reverse later to unzip them again.

Be aware that if you do this you'll lose the ability to extract data directly from the composite file using WinZip (which currently works with your test.zip), so it depends on what you're trying to achieve. I've put some sample code below - if you're creating large zip files then memory may not be the best place to store the temporary zip data.

Cheers,

Mike

 

namespace Discussion_209615
{
    static class Program
    {

        const int start = 64;
        const string fileName = "test-pad.zip";

        static long size;
        
        static void Save()
        {
            using (FileStream f = File.Create(fileName))
            {
                f.Write(new byte[start], 0, start);
                using (MemoryStream m = new MemoryStream())
                {
                    // zip into a memory stream
                    ZipFile zip = new ZipFile(Encoding.UTF8);
                    zip.AddEntry("a.txt", "abc");
                    zip.Save(m);
                    m.Flush();
                    m.Seek(0, SeekOrigin.Begin);
                    // read the data to a byte array
                    byte[] data = new byte[m.Length];
                    if (m.Read(data, 0, data.Length) != data.Length) throw new System.InvalidOperationException();
                    // write to the file
                    f.Write(data, 0, data.Length);
                }
                size = f.Length;
            }
        }

        static void Load()
        {
            using (FileStream f = File.OpenRead(fileName))
            {
                f.Seek(start, SeekOrigin.Begin);
                // read from the file
                byte[] data = new byte[f.Length - f.Position];
                if (f.Read(data, 0, data.Length) != data.Length) throw new System.InvalidOperationException();
                using (MemoryStream m = new MemoryStream())
                {
                    // write to a memory stream
                    m.Write(data, 0, data.Length);
                    m.Flush();
                    m.Seek(0, SeekOrigin.Begin);
                    // read the zip data
                    ZipFile zip = ZipFile.Read(m, Encoding.UTF8);//throws
                    zip.ExtractAll("C:\\temp\\DotNetZip\\debug\\Discussion_209615\\temp");
                }
                /*An unhandled exception of type 'Ionic.Zip.BadReadException' occurred in Ionic.Zip.Reduced.dll
                Additional information:   ZipEntry::ReadHeader(): Bad signature (0xCADDAD9C) at position  0x00000074*/
            }
        }

        static void Main(string[] args)
        {
            Save();
            Load();
        }

    }

}
Apr 19, 2010 at 6:36 PM
As an alternative, here's a ChunkStream implementation I use to only expose a part of an original stream - this transparently masks the underlying FileStream for the ZIP implementation, so you could directly wrap your file stream within this one:
using System;
using System.IO;


namespace Vfs.Transfer.Util
{
  /// <summary>
  /// A stream that provides streamed access to a given range within
  /// another stream, but does not allow reading or writing outside
  /// this block's boundaries.
  /// </summary>
  public class ChunkStream : Stream
  {
    /// <summary>
    /// The stream from which the data is read.
    /// </summary>
    public Stream DecoratedStream { get; private set; }

    /// <summary>
    /// The offset (starting point) of the chunk in the
    /// decorated stream.
    /// </summary>
    public long Offset { get; private set; }

    /// <summary>
    /// The size of the chunk. This is this stream's length.
    /// </summary>
    public long ChunkSize { get; set; }

    private long position;

    /// <summary>
    /// The virtual position within the chunk. The current position
    /// within the <see cref="DecoratedStream"/> is
    /// <see cref="Offset"/> + <see cref="Position"/>. 
    /// </summary>
    public override long Position
    {
      get { return position; }
      set
      {
        if(position < 0 || position > ChunkSize)
        {
          string msg = "Cannot set Position property to '{0}': Position cannot be negative or bigger than the chunk size of [{1}] bytes.";
          msg = String.Format(msg, value, ChunkSize);
          throw new ArgumentOutOfRangeException("value", msg);
        }
        position = value;

        //not a problem if we set the position above the stream length
        //common implementations just correct it
        DecoratedStream.Position = Offset + value;
      }
    }

    /// <summary>
    /// Initializes a new instance of the <see cref="T:System.IO.Stream"/> class. 
    /// </summary>
    public ChunkStream(Stream decoratedStream, long chunkSize, long offset, bool initStreamPosition)
    {
      DecoratedStream = decoratedStream;
      ChunkSize = chunkSize;
      Offset = offset;

      if (initStreamPosition)
      {
        decoratedStream.Position = Offset;
      }
    }

    /// <summary>
    /// When overridden in a derived class, reads a sequence of bytes from the current stream and advances the position within the stream by the number of bytes read.
    /// </summary>
    /// <returns>
    /// The total number of bytes read into the buffer. This can be less than the number of bytes requested if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.
    /// </returns>
    /// <param name="buffer">An array of bytes. When this method returns, the buffer contains the specified byte array with the values between <paramref name="offset"/> and (<paramref name="offset"/> + <paramref name="count"/> - 1) replaced by the bytes read from the current source. 
    /// </param><param name="offset">The zero-based byte offset in <paramref name="buffer"/> at which to begin storing the data read from the current stream. 
    /// </param><param name="count">The maximum number of bytes to be read from the current stream. 
    /// </param><exception cref="T:System.ArgumentException">The sum of <paramref name="offset"/> and <paramref name="count"/> is larger than the buffer length. 
    /// </exception><exception cref="T:System.ArgumentNullException"><paramref name="buffer"/> is null. 
    /// </exception><exception cref="T:System.ArgumentOutOfRangeException"><paramref name="offset"/> or <paramref name="count"/> is negative. 
    /// </exception><exception cref="T:System.IO.IOException">An I/O error occurs. 
    /// </exception><exception cref="T:System.NotSupportedException">The stream does not support reading. 
    /// </exception><exception cref="T:System.ObjectDisposedException">Methods were called after the stream was closed. 
    /// </exception><filterpriority>1</filterpriority>
    public override int Read(byte[] buffer, int offset, int count)
    {
      ValidateReadWriteParams(buffer, offset, count);

      //do not read further than the chunk
      int bytesToRead = (int)Math.Min(count, ChunkSize - Position);

      //if we do not have anything to read, don't read at all - this closes underlying stream
      //implementations!!!
      if(bytesToRead == 0) return 0;
      
      int receivedBytes = DecoratedStream.Read(buffer, offset, bytesToRead);
      
      //update received bytes
      position += receivedBytes;

      return receivedBytes;
    }


    /// <summary>
    /// When overridden in a derived class, writes a sequence of bytes to the current stream and advances the current position within this stream by the number of bytes written.
    /// </summary>
    /// <param name="buffer">An array of bytes. This method copies <paramref name="count"/> bytes from <paramref name="buffer"/> to the current stream. 
    /// </param><param name="offset">The zero-based byte offset in <paramref name="buffer"/> at which to begin copying bytes to the current stream. 
    /// </param><param name="count">The number of bytes to be written to the current stream. 
    /// </param><exception cref="T:System.ArgumentException">The sum of <paramref name="offset"/> and <paramref name="count"/> is greater than the buffer length. 
    /// </exception><exception cref="T:System.ArgumentNullException"><paramref name="buffer"/> is null. 
    /// </exception><exception cref="T:System.ArgumentOutOfRangeException"><paramref name="offset"/> or <paramref name="count"/> is negative. 
    /// </exception><exception cref="T:System.IO.IOException">An I/O error occurs. Also thrown if an attempt to write
    /// beyond the block size is made.
    /// </exception><exception cref="T:System.NotSupportedException">The stream does not support writing. 
    /// </exception><exception cref="T:System.ObjectDisposedException">Methods were called after the stream was closed. 
    /// </exception><filterpriority>1</filterpriority>
    public override void Write(byte[] buffer, int offset, int count)
    {
      ValidateReadWriteParams(buffer, offset, count);

      long remaining = ChunkSize - Position;
      if(count > remaining)
      {
        string msg = "Blocked attempt to write block of [{0}] bytes - write goes beyond the current chunk. Chunk size is [{1}], stream position is [{2}].";
        msg = String.Format(msg, count, ChunkSize, Position);
        throw new IOException(msg);
      }

      //do not read further than the chunk
      int bytesToWrite = (int)Math.Min(count, ChunkSize - Position);

      //if there is nothing to write, really, don't invoke the stream
      if (bytesToWrite == 0) return;

      //write data and advance position
      DecoratedStream.Write(buffer, offset, bytesToWrite);
      position += bytesToWrite;
    }


    private static void ValidateReadWriteParams(byte[] buffer, int offset, int count)
    {
      if (buffer == null) throw new ArgumentNullException("buffer");

      if (offset < 0)
      {
        throw new ArgumentOutOfRangeException("offset", "Offset cannot be negative.");
      }
      if (count < 0)
      {
        throw new ArgumentOutOfRangeException("count", "Number of bytes to read cannot be negative.");
      }
      if ((buffer.Length - offset) < count)
      {
        throw new ArgumentException("Invalid offset length.");
      }
    }


    public override void Flush()
    {
      DecoratedStream.Flush();
    }

    /// <summary>
    /// When overridden in a derived class, sets the position within the current stream.
    /// </summary>
    /// <returns>
    /// The new position within the current stream.
    /// </returns>
    /// <param name="offset">A byte offset relative to the <paramref name="origin"/> parameter. 
    /// </param><param name="origin">A value of type <see cref="T:System.IO.SeekOrigin"/> indicating the reference point used to obtain the new position. 
    /// </param><exception cref="T:System.IO.IOException">An I/O error occurs. 
    /// </exception><exception cref="T:System.NotSupportedException">The stream does not support seeking, such as if the stream is constructed from a pipe or console output. 
    /// </exception><exception cref="T:System.ObjectDisposedException">Methods were called after the stream was closed. 
    /// </exception><filterpriority>1</filterpriority>
    public override long Seek(long offset, SeekOrigin origin)
    {
      long proposedPosition;

      switch(origin)
      {
        case SeekOrigin.Begin:
          proposedPosition = offset;
          break;
        case SeekOrigin.Current:
          proposedPosition = Position + offset;
          break;
        case SeekOrigin.End:
          proposedPosition = ChunkSize + offset;
          break;
        default:
          throw new ArgumentOutOfRangeException("origin");
      }

      if(proposedPosition < 0 || proposedPosition > ChunkSize)
      {
        throw new IOException("Invalid offset with regards to origin");
      }

      Position = proposedPosition;
      return proposedPosition;
    }

    public override void SetLength(long value)
    {
      throw new NotSupportedException();
    }


    public override bool CanRead
    {
      get { return DecoratedStream.CanRead; }
    }

    public override bool CanSeek
    {
      get { return DecoratedStream.CanSeek; }
    }

    public override bool CanWrite
    {
      get { return DecoratedStream.CanWrite; }
    }

    public override long Length
    {
      get { return ChunkSize; }
    }
  }
}
Apr 19, 2010 at 6:52 PM

@hardcodet - I'm probably missing something subtle, but wouldn't a regular FileStream do the same thing with chunking in this scenario instead of the MemoryStream in my example? The zip library seems to stream the input and output rather than load everything in one go...

Apr 19, 2010 at 7:15 PM
Pointy, Absolutely - if I've read the code correctly, the ChunkStream does pretty much the same thing as your implementation (hiding the real stream, and just presenting a chunk of data as a stream that can be addressed with absolute and relative positions). The difference is that the ChunkStream is a bit more generic (you can wrap the original stream with a single line of code), and it might be a bit more resource friendly in terms of memory consumption in case you're dealing with big files (you wouldn't want hundreds of MB in RAM). But nothing wrong with your approach :)
Apr 19, 2010 at 11:38 PM

Cool - pretty much what I thought. I was just checking there wasn't a gotcha hiding somewhere...

Apr 21, 2010 at 8:39 PM

Of course I can work around the issue by making ZipFile believe it is reading/writing at the beginning of the stream.

But I think the documentation stated that I should be able to read/write at any position in a stream. And normal archivers being able to open my file is definitely nice, but not necessary.

ZipFile.Write seems to get the current position of the stream and use that as the starting offset for absolute adresses in the zipfile. ZipFile.Read on the other hand seems to use 0 always.

IMO the Read behaviour is better as default, since Position is not a well defined concept in all streams. Instead I should be able to pass that start offset manually to ZipFile.
Then I can use any position I need in the cases where I want to be compatible with other archivers, and get position indepence(which is a very desirable feature IMO) by default.

I would have prefered if zip only used relative adresses, but without a time-machine it'll be hard to fix that.

Coordinator
Apr 21, 2010 at 9:07 PM

MoC,

just catching up.

You need to NOT seek when reading the ZipFile.   A zipfile can have arbitrary stuff in the front of the file.  It will be readable by winzip and other tools.  It is also readable by DotNetZip when structured in this way.  You don't need to seek to the "start" of the zip file, to successfully read it with DotNetZip.

The way to think about it - the zipfile has offsets stored within it, in the "central directory".  These offsets point to particular places in the file.  If you write 64 bytes of data, or any amount of data, into a file, then save a zipfile into the same FileStream, the offsets that DotNetZip writes will be correct.

If you attempt to read a valid zipfile starting at byte 64, the offsets will be wrong.

I don't know if this clears things up, but. .. . . Just don't seek when you read.

 

 

Coordinator
Apr 21, 2010 at 9:20 PM
Edited Apr 21, 2010 at 9:43 PM

Let me offer some additional description.

Suppose you open a FileStream and write 100 bytes of data.  At that point you also Save a ZipFile to that same stream.  The result is a valid zip file with 100 bytes of data in front of it.  It should be readable by any tool.  It will be readable as a zipfile with DotNetZIp. There is no special seek required.

Suppose you open a MemoryStream and save a ZipFile into it.  Then you open a FileStream and write 100 bytes of data.  Then you copy to that FileStream, all the data from the MemoryStream.  The result is a bytestream with 100 bytes of something, followed by the contents of a zip file.  It will not be readable as a zipfile by any tool.  If you want to read it with DotNetZip, you need to open the file, seek forward 100 bytes, then read.

If this behavior seems illogical or inconsistent to you, you can open a workitem and suggest changes.  So far I don't see the problem, though.  I see that Save() works differently than Read(), with respect to the offsets.  I don't see that as a huge problem though. 

The reason it works the way it does, is because of the ZipFile structure itself.  Some zips require this format.  For example, self-extracting archives.  This approach also allows easy creation of self-extracting archives, whereas using a zero-based offset on Save() would not.   

Update: ok, I can see a case where you have an open stream, and you'd like to just write the zip file into it, without getting the automagic offset re-calculation.  

The way you could do this is to define an OffsetStream that does this for you.  I can't think of a way to do it directly with DotNetZip currently, without caching the content in a MemoryStream.  I think it's a reasonable request.

 

Coordinator
Apr 21, 2010 at 9:52 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.