[Q] Handle bytes in the range 0x80-0xC0 better when dealing with ISO-IR 196.

Stephen J. Turnbull stephen
Thu Nov 23 11:44:48 EST 2006


Aidan Kehoe writes:

 > I have a tentative plan to add a charset to XEmacs, 256 characters of which
 > reflect corrupt Unicode data. These 256 characters will be generated by
 > Unicode-oriented coding systems when they encounter invalid data:

I wish you wouldn't.  Let's just get Unicode inside, and figure out
how to signal errors in a useful way from inside a coding stream.

 > (decode-coding-string "\x80\x80" 'utf-8) 
 > => "\200\200" ;; With funky redisplay properties once display tables and
 > 	      ;; char tables are integrated. Which, whee, is more work. 
 > 
 > And will be ignored by them when writing: 
 > 
 > (encode-coding-string (decode-coding-string "\x80\x80" 'utf-8) 'utf-8)
 > => ""

Yuck.  You realize that you can't do that with the autosave code,
right?  And you don't want to do that if the buffer is unmodified, right?
Sounds like a hell of a lot of work to get right, and it will still be
fragile.

 > This will allow applications like David Kastrup?s reconstruct-utf-8
 > sequences-from-fragmentary-TeX-error-messages to be possible, while not
 > contradicting the relevant Unicode standards. With Unicode as the internal
 > encoding, there?s no need to have a separate Mule character set; we can
 > stick their codes somewhere above the astral planes. But we should maintain
 > the same syntax code for them. Note also that, as far as I can work out,
 > these 256 codes will be sufficient for representing error data for all the
 > other Unicode-oriented representations well as UTF-8.

Sounds dangerous and messy to me.




More information about the XEmacs-Beta mailing list