[Bug: 21.4.19] utf-8 encoding

Aidan Kehoe kehoea at parhasard.net
Tue Aug 22 04:38:55 EDT 2006


 Ar an chéad lá is fiche de mí Lúnasa, scríobh Matthias M. Weber: 

 > As a historian of science and medicine I'm definitely not a software
 > or computer expert, nevertheless, I mainly use (and like) xemacs to
 > edit large latex-files for my books (texts in German and other
 > languages with a lot of diacritic symbols).  A few days ago I changed
 > my gentoo linux system from latin-1 to utf-8 encoding.  Everything
 > went fine, however, I noticed the following bug:
 > 
 > If I edit an ordinary plain text *.tex-file with xemacs 21.4.19 one or
 > two - not more! - out of about 15,000 German Umlauts (ä, ö, ü etc.) in
 > the file are not saved with their correct utf-8 two-byte hex codes
 > (e. g. "ü" = "c3 b3") but with a 3 (!) byte garbled hex code (e.g. "ü"
 > = "ef bf bd"), which, of course, causes a stop of the latex
 > interpreter ("no \u8 symbol").  I checked the hex codes with a
 > hexadecimal editor and could reproduce the bug just by directly
 > replacing the correct code and saving the file once more with xemacs.
 > I couldn't reproduce the bug with other editors (vim).
 > 
 > I don't have any idea about the reason but I noticed that the first
 > byte (c3) was saved at the offset address 0007:fff0 and the second
 > (b3) at 0008:0000.  After I had moved the character two bytes forward
 > (just by adding twa blanks) the bug didn't happen any longer.

If you’re using 21.4 and UTF-8, you’re using the Mule-UCS package and have
(require 'un-define) in your init file. While Mule-UCS is the only way to
make utf-8 available in 21.4, it’s mostly unmaintained and unmaintainable;
its writers find communicating in English difficult, they haven’t been
active in XEmacs work for years anyway, their coding style does not make
debugging their work easy (that is, they use macros heavily in contexts
where the small improvement in abstraction they give is outweighed by their
being much harder to trace through).

That said, while I believe your bug report--I’ve had Mule-UCS trash data for
me lots--I can’t reproduce it. I suggest you switch to a recent beta version
of XEmacs, where you don’t need un-define, and where the Unicode support,
while still not perfect, is much more robust and much better maintained. 

-- 
Santa Maradona, priez pour moi!




More information about the XEmacs-Beta mailing list