21.5 mule: Latin-2(polish) - wrong coding system identification

Wed May 14 06:15:43 EDT 2008

Hi, Krzysztof, and sorry about the delay --

This is my bug, I think. If you have a second, can you apply the following
patch, and check if the problem still happens? 

diff -r 49f8ed034500 lisp/ChangeLog

--- a/lisp/ChangeLog	Mon May 12 11:53:04 2008 +0200
+++ b/lisp/ChangeLog	Wed May 14 12:12:09 2008 +0200
@@ -1,3 +1,9 @@ 2008-05-11  Aidan Kehoe  <kehoea at parhasa
+2008-05-14  Aidan Kehoe  <kehoea at parhasard.net>
+
+	* mule/mule-coding.el (make-8-bit-choose-category): 
+	Control-1 characters extend from #x80 to #x9F (inclusive),
+	not from #x80 to #xBF.
+
 2008-05-11  Aidan Kehoe  <kehoea at parhasard.net>
 
 	* disp-table.el (make-display-table): 
diff -r 49f8ed034500 lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el	Mon May 12 11:53:04 2008 +0200
+++ b/lisp/mule/mule-coding.el	Wed May 14 12:12:09 2008 +0200
@@ -533,7 +533,7 @@ disk to XEmacs characters for some fixed
   (check-argument-range (length decode-table) #x100 #x100)
   (block category
     (loop
-      for i from #x80 to #xBF
+      for i from #x80 to #x9F
       do (unless (= i (aref decode-table i))
            (return-from category 'no-conversion)))
     'iso-8-1))

Bye, 

Aidan

 Ar an naoú lá déag de mí Feabhra, scríobh Krzysztof Rudnik: 

 > I've already mailed to xemacs-beta but I've got no response at all.
 > 
 > I use   mule XEmacs 21.5-b28 "fuki" (+CVS-20071205) configured for
 > `i686-pc-linux'.
 > to edit large number of polish texts encoded in iso-8859-2.
 > 
 > init.el: (I've found this somewhere in the list)
 > (set-language-environment "Latin-2")
 > (setq latin-unity-preapproved-coding-system-list '(iso-8859-2))
 > (latin-unity-install)
 > 
 > 
 > locale : LANG=pl_PL.UTF-8
 > 
 > In most cases xemacs recognizes coding system correctly but sometimes
 > coding system for saving buffer is set to
 > iso-8859-1 :
 > Coding system for saving this buffer:
 >   Latin 1 -- iso-8859-1-unix
 > Default coding system (for new files):
 >   Latin 2 -- iso-8859-2
 > Coding system for keyboard input:
 >   Latin 2 -- iso-8859-2
 > Coding system for terminal output:
 >   Latin 2 -- iso-8859-2
 > 
 > I can even I get :
 > Coding system for saving this buffer:
 >   UTF8 -- utf-8-unix
 > Default coding system (for new files):
 >   Latin 2 -- iso-8859-2
 > Coding system for keyboard input:
 >   Latin 2 -- iso-8859-2
 > Coding system for terminal output:
 >   Latin 2 -- iso-8859-2
 > 
 > 
 > I think the files are properly encoded ( `iconv -f iso-8859-2 -t utf8` does
 > not complain).
 > In fact some of them were prepared in xemacs in Latin2 environment.
 > (usually edit in Latin-2 env -> save -> close -> open again ->  Latin-1)
 > 
 > I redused the problem to a very small (couple of letters) documents and got
 > strange results:
 > 
 > 
 > 1. if a document contains exactly one small polish letter (there are 9 of
 > them) then coding system is always Latin-1
 > 
 > 2. if there are just 2 polish letters then coding system is Latin-2 unless
 > these letters are separated by any string i.e.
 > for example: it is ok for "wziąć"  but not for  "wzią ć"
 > 
 > 3. I could not automaticaly get Latin-2 coding system for documents with
 > exactly 3 polish letters - did't check all posibilites.
 > 
 > 4. I could't see any rule. in more complicated cases
 > 
 > Is this a bug or my xemacs is not configured properly?
 > Could you please help me or at least sugest where I can get help?
 > 
 > 
 > thanks in advance
 > Krzysztof
 
-- 
¿Dónde estará ahora mi sobrino Yoghurtu Nghé, que tuvo que huir
precipitadamente de la aldea por culpa de la escasez de rinocerontes?