Typos in GCC output

Sun Oct 22 14:30:22 EDT 2006

On Mon, 23 Oct 2006, stephen at xemacs.org moaned:
> Nix writes:
> 
>  > It is conditionalized off the locale. GCC makes the (reasonable)
>  > assumption
> 
> You can call it "reasonable assumption" if you like, I call it
> "tyranny of the majority."

Part of the reason for the change is that the `' TeX-style balanced
quotes don't look as good as they used to: in virtually every font used
on modern terminals, they *don't* look balanced at all, they look like
typos. The Unicode quote glyphs do look balanced.

> With all due respect to Joseph Myers, this was an ill-advised change.
> I mean, if this is such a great idea, in UTF-8 locales shouldn't you
> change the parser to allow the C notation for strings to use balanced
> quotes, and in fr_FR locales, guillemots?  That's not just
> typographically correct, it would make things a lot easier for Emacs
> font-lock, you know!

Alas, the C standard says no :) of course, C code is primarily produced
for machines to read, and they prefer consistency. GCC's standard error
stream is parsed in sufficient detail for quotes to matter by perhaps
one or two programs, and what they do isn't terribly complex...

>  > that if you're in a UTF-8 locale, your tools, display devices, and
>  > so on can handle UTF-8.
> 
> It's NOT an issue of "being able to handle" UTF-8---XEmacs handles
> UTF-8 just fine, and if it knows it's coming it will display it
> prettily, too.  The problem is that GCC is making the assumption that
> there's nothing in the pipeline able to handle *more* than just UTF-8,
> that might be expecting something else for any of a number of reasons.

I don't get it. If a program is looking at GCC's standard error stream,
why would it expect anything other than text (7-bit ASCII before this
change, UTF-8 afterwards)?

There's no way we could avoid producing UTF-8 output on stderr in some
circumstances, even if the quotes were kept at `': printed Java
identifiers would have to be Unicode, for starters.

> It may very well be that imposing this pain on XEmacs and other
> multilingual apps is the right thing to do; backward compatibility is
> the only reason not to do it, and backward compatibility cannot be a
> justification for permanent stasis.  But defending it with "reasonable
> assumption" ... I expect better of y'all.

I'll admit that I can't figure out why you would set LANG=en_BLAH.UTF-8
in your environment if your tools were *not* capable of handling UTF-8.
It *still* seems like a reasonable assumption to me. If your tools
can't handle UTF-8, don't set your locale to a UTF-8 locale.

-- 
`When we are born we have plenty of Hydrogen but as we age our
 Hydrogen pool becomes depleted.'