copying certain characters from web browser to xemacs results in backslash followed by three numerals

Aidan Kehoe kehoea at parhasard.net
Tue Jul 17 08:07:43 EDT 2007


 Ar an séú lá déag de mí Iúil, scríobh Adrian Aichner: 

 > Do you get a "~" displayed instead?

I *suspect* he’s running a non-Mule binary; I haven’t ever seen the
three-digits symptom, but then I mostly run with Mule. 

 > That's what I get, with the pasted character being described as:
 > 
 > Char: — (U+2014 chinese-cns11643-1 33 55) point=1 of 1(0%) column 0 
 > 
 > That's what I get in a fundamental-mode buffer.

That’s correct behaviour, but the Unicode redisplay support on Win32 is
worse than on X11—it requires that every character in the Mule charset be
supported in the Win32 font, for example, which is actually impossible with
some of the CNS 11643 character sets—so it doesn’t appear as such. 

On my machine, though, I do have a Big5 font available (I’m not certain, but
I think I downloaded it from Adobe’s East Asian font pack.) And once I
rearrange the character sets that get priority when translating from
Unicode: 

(set-language-unicode-precedence-list '(ascii latin-iso8859-1 latin-iso8859-2 latin-iso8859-3 latin-iso8859-4 thai-tis620 greek-iso8859-7 arabic-iso8859-6 hebrew-iso8859-8 katakana-jisx0201 latin-jisx0201 cyrillic-iso8859-5 latin-iso8859-9 latin-iso8859-15 composite control-1 japanese-jisx0208-1978 chinese-gb2312 japanese-jisx0208 korean-ksc5601 japanese-jisx0212 chinese-big5-1 chinese-big5-2 chinese-cns11643-1 chinese-cns11643-2  arabic-digit arabic-1-column arabic-2-column chinese-sisheng ascii-right-to-left indian-is13194 lao latin-iso8859-14 latin-iso8859-16 ipa vietnamese-viscii-upper vietnamese-viscii-lower chinese-cns11643-3 chinese-cns11643-4 chinese-cns11643-5 chinese-cns11643-6 chinese-cns11643-7 chinese-isoir165 ethiopic indian-2-column indian-1-column japanese-jisx0213-1 japanese-jisx0213-2 thai-xtis tibetan tibetan-1-column))

and paste the text again, it displays correctly. 

 > Page
 > http://www.psychologytoday.com/articles/pto-20070622-000002.xml
 > asserts it's charset=iso-8859-1:
 > <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
 > but then it contains
 > evolved nature&#8212;human nature
 > which doesn't make sense to me.

HTML entities such as that one can legally refer to any Unicode code point.
The charset=[...] directive just specifies how to interpret the octets. 

-- 
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)



More information about the XEmacs-Beta mailing list