bug in file-name-coding-system detection [was: font-lock-fontify-* ...]

Aidan Kehoe kehoea at parhasard.net
Wed Jan 10 06:48:29 EST 2007


 Ar an deichiú lá de mí Eanair, scríobh Stephen J. Turnbull: 

 >  >  > Excuse me? "Sniffing the encoding of a file *name*"? Surely you
 >  >  > mean your patch for "determining the system's
 >  >  > file-name-coding-system"?
 >  > 
 >  > Tomato, tomato :-) . 
 > 
 > Hey, I don't care; if you don't write clearly the first time, I'll
 > just turn up the flame until I do understand.  ;-)

Carefully using the XEmacs vocabulary and having non-XEmacs developers
understand what I mean can be conflicting goals. You understood what I
meant. 

 >  >  > And this is in the *locale* package?  Aidan, that's wrong;
 > 
 > Uh, what exactly is in the locale package?  If anything that has to do
 > with coding detection is in there, 

It’s not.

(The reason things changed is that we take our language environment from
LC_CTYPE and interpretation of that variable got better--so
mule-packages/locale used the newly non-English locale. )

 > it still needs to get moved out (preferably to the Attic, but I gather
 > you're going to resist that :-).
 > 
 >  > echo '(define-coding-system-alias 'file-name 'utf-8) ' >> ~/.xemacs/init.el
 > 
 > Excuse me?  Then what do I do with my FAT USB key?  NFS mounts on
 > systems running who knows what?

I don’t know. It’s your system; I’ve no idea if it does the sensible thing
and presents a normalised UTF-8 api for those cases. I can’t implement this
for you and get it right, without lots of information from you, along the
lines of:

1. Is system-type always 'darwin on Mac OS X? 
2. What does locale -a give you? 
3. Are other file systems’ names re-interpreted as above? 
4. For Win32 file systems accessed over Samba, are the file names normalised
or not? 

and whatever else comes up. And you’ve sufficient experience with XEmacs
that it would be more efficient of everyone’s time were you to do it
yourself. 

 >  > As I said, I don’t have access to an OS X machine. Having a
 >  > system-specific hard-coding of the file-name coding-system alias is
 >  > the right thing to do there, but if I implement it without being able
 >  > to test it, I’ll get it wrong.
 > 
 > You'll still get it wrong, because not all OS X machines rely
 > exclusively on HFS+ file systems.

As I ask above, does the API do the sensible thing and normalise? 

 > The whole problem with your patch is that you think in terms of *you*
 > getting it right, but you can't (except on your own machine).

I can get it right for standard average Unixes, and Win32 at some point in
the future, which are the machines I have access to.

 >  >  > The point is (as I've said before) that the POSIX locale is *not* a
 >  >  > sufficiently reliable way to determine file-name-coding-system. 
 >  > 
 >  > And as I said in lisp/mule-cmds.el and in email, 
 >  > 
 >  >       ;;     On Unix--with the exception of Mac OS X--there is no way
 >  >       ;;     to know for certain what coding system to use for file
 >  >       ;;     names, and the environment is the best guess. If a
 >  >       ;;     particular user's preferences differ from this, then that
 >  >       ;;     particular user needs to edit ~/.xemacs/init.el. Aidan
 >  >       ;;     Kehoe, Sun Nov 26 18:11:31 CET 2006. OS X uses an
 >  >       ;;     almost-normal-form version of UTF-8.
 > 
 > We now have *three* users on *three* different systems who were hosed
 > by this patch. 

Who? You, okay. But Volker Zell and Wulf Krüger? How is having a German
menubar in a German locale ‘hosed?’ My preliminary judgement on Mats’ Gnus
issue is that it’s a Gnus issue, but it’s something I need to look into
further.

 > Your best guess is probably the best guess---and guess what? *It is not
 > sufficiently reliable.* For now, let's not guess; *first* let's give the
 > user a convenient intuitive way to do such configuration himself. Once
 > we've got a way for the user to get himself out of a hole, *then* it's
 > time for you to start digging them.

Bullshit. The intuitive thing is for XEmacs to pay attention to the locale
and use that information. If the user has a particularly riced-out
environment that doesn’t conform to expectations, then that user can modify
the file-name and native coding system aliases by hand. Treating them as
Latin-1 when the environment indicates ‘probably otherwise’ is wrong and
annoying. And you can’t start your XEmacs in "/tmp/за родину!" and open that
directory after startup.

 >  > Our language environment model is not as fine-grained as that of
 >  > POSIX. For working out which language to use on Unix, we pay
 >  > attention to LC_CTYPE and nothing else.
 > 
 > Yet another bug.  That may or may not be correct for file systems (it
 > would be interesting to know what POSIX says about this, and yes,
 > since you're the advocate of using POSIX locales in a multilingual
 > application, I think you should investigate it).

POSIX says that’s wrong. Also, I didn’t implement the use-POSIX-locales
code; I’m fixing bugs in it. 

 > It's *definitely* wrong for UI language stuff (where LC_MESSAGES
 > should rule, of course),  and it should be considered at most a strong
 > hint for coding autodetection.

On Standard Average Unix, it _is_ the file name encoding. I’m mystified as
to why you don’t like this.  

 >  > If you can suggest a better approach to this, that is also
 >  > compatible with language environment treatment on Windows, where the
 >  > granularity is different, have at it. 
 > 
 > The setup that we had before is a better approach, simply because it's
 > the one we had before.  It may be stupid and insane, but it's
 > backwardly compatible stupid insanity that few non-standards-geeks
 > have actually complained about.

It wasn’t cross-platform, and didn’t make sense on native Windows. And note
I didn’t introduce the choice of LC_CTYPE--that was Ben, in 2002.

 > Then the next step is to give users a reasonable way to set things up
 > for themselves.

Users don’t care that much, in Europe. That we don’t respect the environment
just makes us look bad. People still get work done when they see their name
as Krüger or René, and the investment of time to get it working
right--when getting it working right is often not possible--is not economic.
But all other things being equal, they’ll prefer the app that gets it right.

 > After that, autoconfiguration for systems that we know how to guess
 > well.

-- 
When I was in the scouts, the leader told me to pitch a tent. I couldn't
find any pitch, so I used creosote.



More information about the XEmacs-Beta mailing list