bug in file-name-coding-system detection [was: font-lock-fontify-* ...]

Aidan Kehoe kehoea at parhasard.net
Wed Jan 10 04:31:08 EST 2007


 Ar an deichiú lá de mí Eanair, scríobh Stephen J. Turnbull: 

 > Aidan Kehoe writes:
 > 
 >  > Unless your ~/.xemacs/init.el already handles what coding system file
 >  > names are in, you probably don’t want to remove the package. The
 >  > change I made that provoked the new behaviour on your machine added
 >  > support for sniffing what encoding file names were in, which should be
 >  > beneficial for you.
 > 
 > Excuse me?  "Sniffing the encoding of a file *name*"?  Surely you mean
 > your patch for "determining the system's file-name-coding-system"?

Tomato, tomato :-) . 

 > And this is in the *locale* package?  Aidan, that's wrong; the locale
 > package was intended to be data-only, with only the code needed to
 > load the data.  This kind of basic functionality should be in
 > mule-base, or even core.
 > 
 > And now I know who to blame for the fact that suddenly I can no longer
 > reliably read Japanese file names in UTF-8.  (Part of the blame goes
 > to Mac OS X, which doesn't set the locale, but has a whole separate
 > set of internationalization functions---this confuses all Unix
 > software, of course, even ls in an Apple Terminal.)

echo '(define-coding-system-alias 'file-name 'utf-8) ' >> ~/.xemacs/init.el

As I said, I don’t have access to an OS X machine. Having a system-specific
hard-coding of the file-name coding-system alias is the right thing to do
there, but if I implement it without being able to test it, I’ll get it
wrong. 

 > The point is (as I've said before) that the POSIX locale is *not* a
 > sufficiently reliable way to determine file-name-coding-system. 

And as I said in lisp/mule-cmds.el and in email, 

      ;;     On Unix--with the exception of Mac OS X--there is no way to
      ;;     know for certain what coding system to use for file names, and
      ;;     the environment is the best guess. If a particular user's
      ;;     preferences differ from this, then that particular user needs
      ;;     to edit ~/.xemacs/init.el. Aidan Kehoe, Sun Nov 26 18:11:31 CET
      ;;     2006. OS X uses an almost-normal-form version of UTF-8. 

 > The user can set the locale but at least on Mac OS X HFS+ that doesn't
 > affect the file system's encoding, it stays canonically decomposed UTF-8
 > (and will barf on, eg, ISO 8859/2). On the other hand, on most Unix file
 > systems, a file name is simply a binary blob, that happens to be human
 > readable most of the time.
 > 
 > Also, something that sniffs file-name-coding-system should definitely
 > *not* affect user interface.

As I followed up to Wulf, I was wrong in that. What seems to have happened
is that the improved POSIX locale handling picked up that de_DE.UTF-8 was a
German locale where it didn’t before, and mule-packages/locale just payed
attention to that. If his LC_CTYPE had been de_DE all along, he would have
had his menus in incompletely-translated German all along.

Our language environment model is not as fine-grained as that of POSIX. For
working out which language to use on Unix, we pay attention to LC_CTYPE and
nothing else. If you can suggest a better approach to this, that is also
compatible with language environment treatment on Windows, where the
granularity is different, have at it. 

-- 
When I was in the scouts, the leader told me to pitch a tent. I couldn't
find any pitch, so I used creosote.



More information about the XEmacs-Beta mailing list