Out Of Memory with Large set of large files?

Coordinator
May 31, 2008 at 1:22 AM
Peter Rorlach wrote:

Hello,

What you created here is a very useful library indeed. And while it would have been neat if the samples in the CHM also included VB examples, I can usually figure out the code.

However, I still get this exception thrown when I try to use the library on larger files. Example:  folder contains 64 files, totalling 2.2GB. This should become a single archive but the exception “out of memory” is thrown every time I attempt this (full error text below). Happens with other folders containing other file types as well. It appears to be strictly a matter of the total size, and it happens at the .Save stage.

I don’t know how to reserve a segment of memory just for this library from within the application. Here are my specs, and I believe they should not run into this type of problem:

CPU: Intel Dual Core 2.4GHz
RAM: 2GB
HD: 0.8 TB free disk space
OS: Windows Vista Premium
Paging: 4GB in two paging files
VS 2005 using VB
Library version 1.5 preview (although the error also occurred with version 1.4.3)

I did raise the issue on-site but did not see any response to it yet. 

Thank you,

With best regards, 

Dr. Peter Rorlach,
Lead Technical Author/Project Manager
Brussels, Belgium

 

Exception Message:

30/05/2008 - 05:36           Plugins Backup failed..: Exception of type 'System.OutOfMemoryException' was thrown. (System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.IO.MemoryStream.set_Capacity(Int32 value)
   at System.IO.MemoryStream.EnsureCapacity(Int32 value)
   at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.Compression.DeflateStream.InternalWrite(Byte[] array, Int32 offset, Int32 count, Boolean isAsync)
   at System.IO.Compression.DeflateStream.Write(Byte[] array, Int32 offset, Int32 count)
   at Ionic.Utils.Zip.CRC32.GetCrc32AndCopy(Stream input, Stream output)
   at Ionic.Utils.Zip.ZipEntry.WriteHeader(Stream s, Byte[] bytes)
   at Ionic.Utils.Zip.ZipEntry.Write(Stream outstream)
   at Ionic.Utils.Zip.ZipFile.Save()
   at Ionic.Utils.Zip.ZipFile.Save(String ZipFileName)
   at DEPSeek.frmDepSeek.zipAll(String wFolder, String wExt, String wTitle) in D:\HandsOn\Development\VS2005-UWP2\DepSearch\DEPSeek\DEPSeek\frmMain.vb:line 1009) (Dr Peter Rorlach)

Coordinator
May 31, 2008 at 1:25 AM
Edited May 31, 2008 at 1:27 AM
I did not see any discussion item on this.
I did see an "Issue" reported but no clear instructions on how to reproduce it.
 
I've been testing this all afternoon; I have not been able to reproduce the problem you reported.
Maybe you can do some additional investigation?
At what point does the problem occur?  2.2GB exactly?  What if you zipped up an archive of 2.0GB? Does that work?

Does the problem happen if you use the command line utilities I include in the v1.5 release to zip up the files?

Does it fail always on the same file? 
What is the format of those files?  are they already compressed?  If so have you tried the ForceNoCompression flag?

etc
etc
Jun 5, 2008 at 11:59 AM
Edited Jun 5, 2008 at 2:25 PM
Sorry to have taken so long to get back to you.
Here's where I am with this: I've tried verious different file types with varying sizes and once I get near or over 2 GB in the total original size the same error occurs. I forgot to add that my current .NET version is 3.5. And yes, I used both the stable release 1.4.3, and the preview release.
I have not tried it with the ForceNoCompression flag since that seems to defeat the purpose of a ZIP archive - then I might as well copy the files as they are. And no, the files have not been compressed.
I did notice, however, that some files end up larger inside the archive, thus I am wondering if that affects the error. And of course why that would be? I do realize that some file types simply cannot be compressed but a text or HTML file should end up smaller since both consist largely of white space.

BTW: if an extracted file already exists (version 1.5), even with the overwrite flag set to true, an exception (Access Denied) is thrown - I do have full admin access to the machine this is running on. This I can work around, so it is no big deal. (Ignore this - it only happens in Vista if the explore is still pointed at that file and if that file can be previewed, such as a html, txt, or graphics file)

Thanks for your efforts.

Peter R.
Coordinator
Jun 5, 2008 at 3:18 PM
Hey Doc,

On the "Access Denied"  problem, by default the Extract() methods do not overwrite existing files.  There are extraction methods and properties on the ZipEntry dealing with overwriting existing files.   The doc is not clear on this - the doc for the Extract() methods that do not accept an Overwrite flag should explicitly state "existing entries will not be overwritten".  I am changing that now.


On the compression anomalies - would you be willing to share your data?  I don't know how to reproduce it. I've tried but no joy.
  • when I compress previously compressed data, I get the same-or-smaller sizes.  There is logic in the library to insure this. I don;t know why yours would ever expand.
  • I went to the 2gb threshold and beyond.  Never saw the problem you are reporting.

Maybe if I had your files I could see the problem.
can you load them up to skydrive or someplace like that?

of course it could be private data in which case, we'll have to think of something else.

Jun 5, 2008 at 4:12 PM

Thanks, Cheeso.

 

As I've edited (perhaps too late) the "Access Denied" is really a problem with the way Vista retains inconsistencies already present in XP's Explorer - no matter what you select in Folder Options, Vista goes its own way. One of the many reasons it will soon disappear from my drives.

As for the data sharing, I've no problem with that, these are simply game files (SC4 - I am writing an organisational tool for it, for my own use). These are DAT typed files, the content of which is largely text I don't know "skydrive" but will look into it to see if this can be done" I am sure you have more pressing priorities, and meanwhile I can experiment some more. There other problems I am having, but I am certain they are due to my misunderstanding something or other.

One suggestion, though: .entryfilenames can be ennumerated via an integer; .item cannot - it always requires the actual filename, which in case of folder or empty names seems to run into problems. Would it be possible - in the next release version - to permit .item(integer) as well?

Thanks for your time,

Peter


Cheeso wrote:
Hey Doc,

On the "Access Denied"  problem, by default the Extract() methods do not overwrite existing files.  There are extraction methods and properties on the ZipEntry dealing with overwriting existing files.   The doc is not clear on this - the doc for the Extract() methods that do not accept an Overwrite flag should explicitly state "existing entries will not be overwritten".  I am changing that now.


On the compression anomalies - would you be willing to share your data?  I don't know how to reproduce it. I've tried but no joy.
  • when I compress previously compressed data, I get the same-or-smaller sizes.  There is logic in the library to insure this. I don;t know why yours would ever expand.
  • I went to the 2gb threshold and beyond.  Never saw the problem you are reporting.

Maybe if I had your files I could see the problem.
can you load them up to skydrive or someplace like that?

of course it could be private data in which case, we'll have to think of something else.




Coordinator
Jun 5, 2008 at 10:08 PM
ok.
on skydrive - let me know if / when you can post your files somewhere.  Skydrive is just a free file-sharing spot.  you sign up and can post up to 50 gb I think. There are lots of other options .

on the int-based enumeration - sounds like a reasonable request.  Originally I thought that enumerating by integer index would nto be useful, but maybe it is.
Before you open a Work Item for this, could you post some code that shows what you want to do?  I don't really get the problem you might be having in the case of folders or "empty names" - I'm not even sure what it means to have an "empty name" for an entry in a zip file.   So i'd like to better understand the thing you're doing, first.

Jun 6, 2008 at 7:26 PM
Sorry, did not see this earlier. Willco on Sunday - weekends here tend to leave me little time for coding.
Jun 8, 2008 at 10:34 AM
Ok, mysterie solved: the problem is not with your library but with the ZIP format altogether: I've used other utilities to receive the same error, just differently explained: individual file size of files to be comressed:

"..\Plugins.zip: Files size is too large for ZIP archive. Use RAR instead"

This will not happen when you use the Windows Explorer's built-in compression because the resulting "ZIP" file isn't really a ZIP. Here's a list of the files:
06/06/2008  14:12        74,852,774 CAM.dat
06/06/2008  14:19        77,758,731 PEGPROD.dat
06/06/2008  14:23       111,460,670 Urban.dat
06/06/2008  14:22       127,318,807 Simgoober.dat
06/06/2008  21:38       136,230,834 Eye Candy.dat
06/06/2008  14:18       227,996,297 NDEX.dat
06/06/2008  14:12       930,574,697 BSC.dat

Certainly the last one, possibly the penultimate will trigger this problem. Now I wonder if there's a way around it..
Jun 10, 2008 at 11:18 PM
Sorry but the "mystery solved" was permature. I've reduced the file sizes below 500MB, and even added a check that every file above 250MB uses the FoceNoCompression flag, but I still get the same error during save: out of memory, since the total ZIP file would exceed 2GB. The error always happens during the save - never during the acual archiving.

I am at a loss..
Oct 6, 2008 at 3:58 PM
Edited Oct 6, 2008 at 4:00 PM
I am having this same issue when trying to archive a file of ~800MB.

This is my code


// select only the file names that match current suffix
foreach (String file in (from f in fileNames
                                       where f.Contains(sfx)
                                       select f))
{
    // add to zip
    zf.AddFileStream(Path.GetFileName(file), String.Empty, new FileStream(file, FileMode.Open));
}

 

// save the zip file
zf.Save(Path.Combine(outputDirectory, CalculateZipName(sfx)));

 




The exception always happens on the last line, and the trace is the same as was posted above. Sadly I cannot share my data; but I would like to assit in fixing this issue.

On small files, this same code works great.

Oct 6, 2008 at 6:24 PM
I have been able to track this down to this method: CRC32.GetCrc32AndCopy() when being called from within the ZipEntry.WriteHeader() method.

It is inside the While loop in GetCrc32AndCopy that the process starts to eat memory (~800MB in my case, roughly equilivent to my file size).

I hope this can help track down this issue!
Coordinator
Oct 7, 2008 at 12:15 AM
Edited Oct 7, 2008 at 12:41 AM
ok, let me look.
I have never been able to get the problem to occur.

-----
update: Hmmm, I can see this is a design "feature". As a file is compressed, the data is written to a memorystream.  Everything is kept in memory. Actually, this is true whether or not the data is compressed.  (Eg, even if ForceNoCompression is True, then you still get all the file data in memory at one time, for each entry added to the zipfile).  Bottom line:  all of the file data for an entry is kept in memory at one time, and for very large files this can lead to out-of-memory errors. 

What is required is that the data be written to the file or output stream as it is compressed.

Currently the approach is a bit naive.
Coordinator
Oct 7, 2008 at 1:02 AM
fuzzerd, thanks for bringing this up. 
I have re-opened workitem 5028 to fix this problem.
http://www.codeplex.com/DotNetZip/WorkItem/View.aspx?WorkItemId=5028
Oct 7, 2008 at 2:45 PM
Do you see this being a bugfix release to v1.5 or just v1.6?
Coordinator
Oct 7, 2008 at 3:08 PM
Hmm, it's an architectural change in how the zip engine works.
It would definitely qualify as a high-impact change. 
Given that, it would make sense to put it in the next major release.

why?
why do you ask?
Oct 7, 2008 at 3:37 PM
I was just trying to get a time frame for this fix, because of this issue I've been forced to switch to SharpZipLib ( a curse word arond here I know ) but that library does not eat memory on large files, but it does not offer the same ease of use and it's output files are 20% larger than yours.
Coordinator
Oct 8, 2008 at 1:14 AM
The timing I think is independent of the version number.
I wouldn't want to put it in v1.5, only because it's a fundamental change, definitely not a 2-line bugfix.

I am testing it now with v1.6.
But I never had a test case to make it break, so I will need you to verify that it works fo ryou.
Oct 8, 2008 at 1:45 AM
Post a message here when the 1.6 release with these changes is available and I'll test it out as soon as possible.
Coordinator
Oct 8, 2008 at 8:55 AM
ok, try the latest v1.6 prelim release:
http://www.codeplex.com/DotNetZip/Release/ProjectReleases.aspx?ReleaseId=14569 
Oct 8, 2008 at 3:44 PM
This appears to have fixed my issue, my files are now zipping up great. Memory usage is constant over the entire execution.