[Bug: 21.5-b25] Problems with latin-unity and VM

Wed Feb 7 08:39:03 EST 2007

Joachim Schrod writes:

 > Ah -- that I didn't know. Reading the Coding System section of the
 > XEmacs manual, it didn't seem so, there differences between binary and
 > iso-8859-1 are explicitly named.
 > 
 >     In contrast, the coding system `binary' specifies no character
 >     code conversion at all--none for non-Latin-1 byte values and none
 >     for end of line. This is useful for reading or writing binary
 >     files, tar files, and other files that must be examined verbatim.

Much of this stuff is very poorly explained, and of course "no
conversion" is a misnaming.

The difference is that iso-8859-1, like all XEmacs coding systems, is
split into three coding systems, iso-8859-1-unix, iso-8859-1-dos, and
iso-8859-1-mac, which convert the platform representation of newline
(LF, CRLF, and CR respectively) to the internal representation of LF.
These are normally automatically detected and very rarely explicitly
set, so the normal "spelling" is to omit the EOL convention indicator.
(There are better APIs, but they're not GNU compatible.)

This means that the conversion of a binary file to internal coding via
iso-8859-1-unix is one-to-one onto, and thus invertible.  That's not
true for the others.  The binary coding system is then defined, not as
"no conversion", but as an alias of iso-8859-1-unix.

Thus

(defun dos2unix (file)
  (find-file file 'automatic-conversion-dos)
  (write-file file nil (coding-system-change-eol-conversion
                        buffer-file-coding-system 'unix)))

 > as follows: In buffers with coding system 'binary there must not be
 > the character U+5357, by definition, because no such octet exists. 
 > When the buffer-file-coding-system-for-read is set to 'binary, such a
 > character would not be constructed at all. Yanking that character in
 > such a buffer would signal an error.

This is a priori a reasonable design.  However, in practice people
like the feature of automatic detection of coding system, but they
also like to edit buffers freely.  So if you detect ISO 8859-2 and
want to add a EURO SIGN to the buffer, people would be rather annoyed
to have to change coding systems, especially if they were lucky and
had already deleted all the Latin-2 so that they could use
ISO-8859-15, and then later wanted to put in some Russian and had to
change to Unicode.  And what about buffers (eg, shell buffers) which
in typical use are never saved, but also might be saved?  Why bother
the user about their coding systems at all?

So it makes sense to defer coding compatibility checks (and associated
errors) until just before an I/O operation.