[Bug: 21.5-b25] Problems with latin-unity and VM

Wed Feb 7 08:47:55 EST 2007

>>>>> "SJT" == Stephen J Turnbull <stephen at xemacs.org> writes:

Thanks for your response!

SJT> Otherwise, *all* buffers in both Emacsen are composed of
SJT> *characters*, which you may think of as being implemented as
SJT> arbitrary nonnegative integers.

SJT> It is true (just as in UTF-8) that internal characters that are ASCII
SJT> characters are represented in a single octet, by their ASCII codes.
SJT> However, C1 and Latin-1 characters are represented in *two* octets.
SJT> [...]
SJT> There's a second misconception implicit in your statement, which is
SJT> that the coding of the file is somehow reflected in the buffer.

Yes, I had the 2nd misconception (and your other email explains very
well your design decision not to do so). But I understood that XEmacs
uses its internal character encoding in buffers; while I thought some
more meta-information about the origin of that encoding is available,
that actually didn't seem to matter here.

I thought that the coding-system that is used to read a file
influences the creation of the XEmacs buffer "character array". 
Specifically, I thought that 'buffer-file-coding-system-for-read and
eventually 'file-coding-system-alist is used for that. I.e., I thought
that these variables control how the external octet representation are
transformed into internal XEmacs characters.
  And, AFAICS, this matches your explanation, so I still think that. ;-)

My error was at a different place: I thought that one octet (let's say,
0xea or "ä") in a file could end up as two different internal XEmacs
characters (let's use for the sake of this example the symbolic names
'octet-0xe4 and 'latin1-aumlaut), depending of the coding system used
for reading. There, a coding system of 'binary would trigger the
creation of the first XEmacs character, and a coding system of
'iso-8859-1 (or any of its variants) would trigger the creation of the
second character.

And that seems to be my primary misconception: Since you tell me that
"C1 and Latin-1 characters are represented in *two* octets", it seems
that these two situations (difference between coding systems 'binary
and 'iso-8859-1 during read) are not distinguished and that the
internal character 'latin1-aumlaut is always created, because the
internal XEmacs character 'octet-0xea does not exist. (The latter
would probably correspond to the FSF Emacs unibyte encoding that you
mentioned that XEmacs doesn't have.)

So I think I finally got it; thanks for your time that you invested in
the explanation,

	Joachim

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Joachim Schrod				Email: jschrod at acm.org
Roedermark, Germany