[Bug: 21.5-b25] Problems with latin-unity and VM
Aidan Kehoe
kehoea at parhasard.net
Wed Feb 7 07:45:47 EST 2007
Ar an seachtú lá de mí Feabhra, scríobh Joachim Schrod:
> >>>>> "AK" == Aidan Kehoe <kehoea at parhasard.net> writes:
> AK> Ar an seachtú lá de mí Feabhra, scríobh Joachim Schrod:
>
> >> I would have expected that latin-unity does NOT attempt to change
> >> the encoding at all for such files -- after all, they are declared
> >> as binary and the notion of Latin characters in binary files makes
> >> no sense.
>
> AK> We (XEmacs) don’t distinguish iso-8859-1 and binary in your sense;
>
> Ah -- that I didn't know. Reading the Coding System section of the
> XEmacs manual, it didn't seem so, there differences between binary and
> iso-8859-1 are explicitly named.
They are not--‘no character code conversion [...] for non-Latin-1 byte
values’ is what it says. It is badly and unclearly put, though.
> In contrast, the coding system `binary' specifies no character
> code conversion at all--none for non-Latin-1 byte values and none
> for end of line. This is useful for reading or writing binary
> files, tar files, and other files that must be examined verbatim.
>
> But with that information your explanation gets clearer. Though I have
> to say that I would have naively answered your question
>
> AK> Consider; how can you interpret a sequence of octets on disk as
> AK> U+5357, the Han character for ‘southwards,’ without abandoning the
> AK> treatment as ‘binary’--a sequence of octets--and checking instead
> AK> for ISO-2022-1 or UTF-8 sequences?
>
> as follows: In buffers with coding system 'binary there must not be
> the character U+5357, by definition, because no such octet exists.
> When the buffer-file-coding-system-for-read is set to 'binary, such a
> character would not be constructed at all. Yanking that character in
> such a buffer would signal an error. I also would have expected any
> attempt to set buffer-file-coding-system to 'binary in a buffer with
> such a character to signal an error.
I’m not aware of any environment that implements that behaviour--though it
would seem more correct. Are you? Non-Unicode Windows apps, for example,
trash data when people type in or paste characters that don’t occur in the
app’s code page.
--
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)
More information about the XEmacs-Beta
mailing list