XEmacs cannot display U+FFFD (REPLACEMENT CHARACTER) correctly

Aidan Kehoe kehoea at parhasard.net
Sun Jul 22 15:38:04 EDT 2007


 Ar an fichiú lá de mí Iúil, scríobh Mike FABIAN: 

 > XEmacs 21.5.x cannot display U+FFFD correctly. In "normal" text files, a
 > wrong glyph is shown, in web-pages viewed with w3m.el only garbage
 > Chinese characters are shown after the first occurence of U+FFFD.
 >
 > The problem seems to be that XEmacs maps U+FFFD to Big5:
 > 
 > (split-char (string-to-char (decode-coding-string "\357\277\275"
 > 'utf-8))) => (chinese-big5-1 35 110)
 > 
 > and the reason for this seems to be that BIG5.TXT in the XEmacs sources
 > (which comes originally from Unicode.org) maps several Big5 characters to
 > U+FFFD.

Okay, for the sake of round-trip compatibility (especially with a future
Unicode-oriented XEmacs), we should map those 7 characters either to a
private-use area (and one outside of the BMP) or an area outside of
Unicode. Do you have any objections to the following mapping?

0xA15A => U+FA15A
0xA1C3 => U+FA1C3
0xA1C5 => U+FA1C5
0xA1FE => U+FA1FE
0xA240 => U+FA240
0xA2CC => U+FA2CC
0xA2CE => U+FA2CE

I’m not sure that this will solve the issue with W3M; I don’t have a URL to
test that with.

-- 
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)



More information about the XEmacs-Beta mailing list