German umlauts shown as double width when (set-language-environment "Japanese")

Aidan Kehoe kehoea at parhasard.net
Wed Jan 19 14:22:27 EST 2005


 Ar an naoú lá déag de mí Eanair, scríobh Mike FABIAN: 

 > It is probably not possible to handle U+00B1 as double width in XEmacs
 > and handle German Umlauts as single width at the same time.

“Handle” is a bit ambiguous there. All the following characters are distinct
within XEmacs, but identical in Unicode (select the expression and press C-x
C-e to see how your XEmacs displays them):

(format "%c %c %c %c %c %c %c %c"
       (make-char 'latin-iso8859-1 #xfc)
       (make-char 'latin-iso8859-14 #xfc)
       (make-char 'latin-iso8859-15 #xfc)
       (make-char 'latin-iso8859-2 #xfc)
       (make-char 'latin-iso8859-3 #xfc)
       (make-char 'latin-iso8859-9 #xfc)
       (make-char 'chinese-gb2312 #x28 #x39)
       (make-char 'japanese-jisx0212 #x2b #x64))

So you can certainly have a double-width ü at the same time as a
single-width ü, within a single XEmacs :-) .

Now, it’s evident that the first six should be unified, but less so that a
Japanese X11 user will prefer that the system’s ISO 8859-1 font(s) be used
for the character, if the available JISX 0212 fonts are more comprehensive
and aesthetically pleasing than the available ISO 8859-1 fonts, or if the
character is in the middle of a stretch of Japanese text.

I would prefer that we moved to a UCS-with-language-tagging character model
internally, and that charset objects--to describe iso-8859-1, koi8-r,
etc.--have an optional language tag associated with them, observed and
passed up to XEmacs as extent information, when coding systems use those
charset objects to decode text. (The iso-8859-1 character set would be used
both when an iso-2022-8 coding-system hits non-ASCII text, and by some
imaginary Windows-1252 decoder, for example.)

For JIS kana and kanji sets, that language tag would always be “ja,” which
would tell redisplay to use fonts for Han characters read using those
character sets, that are appropriate for Japan. For Cyrillic, the Serbian
language environment would associate a “sb” language tag with the available
Cyrillic character sets, so redisplay can choose the Serbian variant of
those italic (I believe) characters that are treated differently there
compared to usage in the rest of the Cyrillic-using world. 

The various language environments would also set up language tags for
various stretches of the UCS code space, to be used by default for UTF
encodings. 

Anyway, that’s talk. Implementing it is something else. 

-- 
“Ah come on now Ted, a Volkswagen with a mind of its own, driving all over
the place and going mad, if that’s not scary I don’t know what is.”




More information about the XEmacs-Beta mailing list