bug in file-name-coding-system detection [was: font-lock-fontify-* ...]

Wed Jan 10 05:57:56 EST 2007

Aidan Kehoe writes:

 >  > Excuse me?  "Sniffing the encoding of a file *name*"?  Surely you mean
 >  > your patch for "determining the system's file-name-coding-system"?
 > 
 > Tomato, tomato :-) . 

Hey, I don't care; if you don't write clearly the first time, I'll
just turn up the flame until I do understand.  ;-)

 >  > And this is in the *locale* package?  Aidan, that's wrong;

Uh, what exactly is in the locale package?  If anything that has to do
with coding detection is in there, it still needs to get moved out
(preferably to the Attic, but I gather you're going to resist that :-).

 > echo '(define-coding-system-alias 'file-name 'utf-8) ' >> ~/.xemacs/init.el

Excuse me?  Then what do I do with my FAT USB key?  NFS mounts on
systems running who knows what?

 > As I said, I don’t have access to an OS X machine. Having a system-specific
 > hard-coding of the file-name coding-system alias is the right thing to do
 > there, but if I implement it without being able to test it, I’ll get it
 > wrong. 

You'll still get it wrong, because not all OS X machines rely
exclusively on HFS+ file systems.

The whole problem with your patch is that you think in terms of *you*
getting it right, but you can't (except on your own machine).

 >  > The point is (as I've said before) that the POSIX locale is *not* a
 >  > sufficiently reliable way to determine file-name-coding-system. 
 > 
 > And as I said in lisp/mule-cmds.el and in email, 
 > 
 >       ;;     On Unix--with the exception of Mac OS X--there is no way to
 >       ;;     know for certain what coding system to use for file names, and
 >       ;;     the environment is the best guess. If a particular user's
 >       ;;     preferences differ from this, then that particular user needs
 >       ;;     to edit ~/.xemacs/init.el. Aidan Kehoe, Sun Nov 26 18:11:31 CET
 >       ;;     2006. OS X uses an almost-normal-form version of UTF-8. 

We now have *three* users on *three* different systems who were hosed
by this patch.  Your best guess is probably the best guess---and guess
what?  *It is not sufficiently reliable.*  For now, let's not guess;
*first* let's give the user a convenient intuitive way to do such
configuration himself.  Once we've got a way for the user to get
himself out of a hole, *then* it's time for you to start digging them.

 > Our language environment model is not as fine-grained as that of
 > POSIX. For working out which language to use on Unix, we pay
 > attention to LC_CTYPE and nothing else.

Yet another bug.  That may or may not be correct for file systems (it
would be interesting to know what POSIX says about this, and yes,
since you're the advocate of using POSIX locales in a multilingual
application, I think you should investigate it).

It's *definitely* wrong for UI language stuff (where LC_MESSAGES
should rule, of course), and it should be considered at most a strong
hint for coding autodetection.

 > If you can suggest a better approach to this, that is also
 > compatible with language environment treatment on Windows, where the
 > granularity is different, have at it. 

The setup that we had before is a better approach, simply because it's
the one we had before.  It may be stupid and insane, but it's
backwardly compatible stupid insanity that few non-standards-geeks
have actually complained about.

Then the next step is to give users a reasonable way to set things up
for themselves.

After that, autoconfiguration for systems that we know how to guess
well.