1

Closed

Provide a way to Force UTF-8 (or whatever the alternate encoding is) in filenames

description

Right now DotNetZip chooses a System.Text.Encoding to use, for encoding filenames and comments attached to ZipEntry items, based on a heuristic applied within the library, like this:
 
  • use IBM437 (the default zip encoding) if it works
  • if not, use an alternate encoding, which is specified in ProvisionalAlternateEncoding.
     
    DotNetZip applies this heuristic to each entry independently. The result is that a zipfile can contain a number of entries, some of which are encoded with IBM437, and some of which are encoded with something else.
     
    The request here is to allow a way to force a particular encoding: either UTF8, or a language-specific encoding, or something else.
     
    This will require a change in property names at a minimum, in order to make sense. the "Provisional" in the ProvisionalAlternateEncoding property name which is present on ZipFile, ZipOutputStream, and ZipEntry, becomes no longer "provisional" when it is forced. It is simply an AlternateEncoding, which is then applied in a different way: The current heuristic-driven approach (using the alt encoding "as necessary") can remain , and also the current option of "never" using the alternate encoding. This change will add a new option, which is "always" use the alt encoding.
     
    So the way I propose to implement this is to remove ProvisionalAlternateEncoding, and replace it with AlternateEncoding and AlternateEncodingUsage - the former is a System.Text.Encoding, and the latter is an enum with 3 values: AsNecessary, Always, Never.
Closed Jul 11, 2011 at 12:30 PM by Cheeso
implemented in changeset 80525. The first binary that will have this feature is v1.9.1.6.

comments

mdcclxv wrote Jun 3, 2011 at 10:33 AM

I would rather say "Force Provisional encoding in filenames", for maximum flexibility.

Thanks a lot,
Mircea.

Cheeso wrote Jun 23, 2011 at 7:50 PM

OK, what I'm considering doing is, introducing a new property to provide the ability to indicate whether you'd like the encoding to apply "as necessary" , or "always". The former, "as necessary" is the behavior the library uses currently. The "Always" behavior is what you'd get if you always use a particular encoding in a particular zipfile. It would employ the "provisional alternate encoding" as specified in property "ProvisionalAlternateEncoding". The existence of the new property (always/asnecessary) will render the meaning of "Provisional Alternate Encoding" sort of moot - so I expect to change the name to "AlternateEncoding" (dropping the "provisional" modifier) and of course changing the documentation.