[Bug: 21.5-b25] Problems with latin-unity and VM

Aidan Kehoe kehoea at parhasard.net
Wed Feb 7 07:45:47 EST 2007


 Ar an seachtú lá de mí Feabhra, scríobh Joachim Schrod: 

 > >>>>> "AK" == Aidan Kehoe <kehoea at parhasard.net> writes:
 > AK>  Ar an seachtú lá de mí Feabhra, scríobh Joachim Schrod: 
 > 
 > >> I would have expected that latin-unity does NOT attempt to change
 > >> the encoding at all for such files -- after all, they are declared
 > >> as binary and the notion of Latin characters in binary files makes
 > >> no sense.
 > 
 > AK> We (XEmacs) don’t distinguish iso-8859-1 and binary in your sense;
 > 
 > Ah -- that I didn't know. Reading the Coding System section of the
 > XEmacs manual, it didn't seem so, there differences between binary and
 > iso-8859-1 are explicitly named.

They are not--‘no character code conversion [...] for non-Latin-1 byte
values’ is what it says. It is badly and unclearly put, though. 

 >     In contrast, the coding system `binary' specifies no character
 >     code conversion at all--none for non-Latin-1 byte values and none
 >     for end of line. This is useful for reading or writing binary
 >     files, tar files, and other files that must be examined verbatim.
 > 
 > But with that information your explanation gets clearer. Though I have
 > to say that I would have naively answered your question
 > 
 > AK> Consider; how can you interpret a sequence of octets on disk as
 > AK> U+5357, the Han character for ‘southwards,’ without abandoning the
 > AK> treatment as ‘binary’--a sequence of octets--and checking instead
 > AK> for ISO-2022-1 or UTF-8 sequences?
 > 
 > as follows: In buffers with coding system 'binary there must not be
 > the character U+5357, by definition, because no such octet exists. 
 > When the buffer-file-coding-system-for-read is set to 'binary, such a
 > character would not be constructed at all. Yanking that character in
 > such a buffer would signal an error. I also would have expected any
 > attempt to set buffer-file-coding-system to 'binary in a buffer with
 > such a character to signal an error.

I’m not aware of any environment that implements that behaviour--though it
would seem more correct. Are you? Non-Unicode Windows apps, for example,
trash data when people type in or paste characters that don’t occur in the
app’s code page.

-- 
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)



More information about the XEmacs-Beta mailing list