Mule bugs: misidentification (Latin-1 vs. Chinese), revert issues

Aidan Kehoe kehoea
Mon Oct 23 15:10:20 EDT 2006


 Ar an tr?? l? is fiche de m? Deireadh F?mhair, scr?obh Michael Sperber: 

 > I open a file with UTF-8 coding-system.  I touch that file outside of
 > XEmacs and then do M-x revert-buffer RET.  The non-ASCII characters
 > get mangled.  From the looks of it, upon re-reading the file is
 > treated as Latin-1 (i.e. multibyte UTF-8 encodings get turned into the
 > characters represented by their component bytes), and the result is
 > then translated to UTF-8.  (The modeline still says "UTF-8", though.)
 > For example, "Anf?nger" gets turned into "Anf??nger" upon revert.  (I
 > hope Gnus hasn't screwed this up on send.)

There?s a comment from Ben in the sources about this problem: 

  /* The replace-mode code is currently implemented by comparing the
     file on disk with the contents in the buffer, character by character.
     That works only if the characters on disk are exactly what will go into
     the buffer -- i.e. `binary' conversion.

     FSF tries to implement this in all situations, even the non-binary
     conversion, by (in that case) loading the whole converted file into a
     separate memory area, then doing the comparison.  I really don't see
     the point of this, and it will fail spectacularly if the file is many
     megabytes in size.  To try to get around this, we could certainly read
     from the beginning and decode as necessary before comparing, but doing
     the same at the end gets very difficult because of the possibility of
     modal coding systems -- trying to decode data from any point forward
     without decoding previous data might always give you different results
     from starting at the beginning.  We could try further tricks like
     keeping track of which coding systems are non-modal and providing some
     extra method for such coding systems to be given a chunk of data that
     came from a specified location in a specified file and ask the coding
     systems to return a "sync point" from which the data can be read
     forward and have results guaranteed to be the same as reading from the
     beginning to that point, but I really don't think it's worth it.  If
     we implemented the FSF "brute-force" method, we would have to put a
     reasonable maximum file size on the files.  Is any of this worth it?
     --ben

     */

Now, I should have said OF COURSE IT?S WORTH IT a couple of years ago and
done something about it. But as it is, it?s a clear bug.

-- 
Santa Maradona, priez pour moi!



More information about the XEmacs-Beta mailing list