Typos in GCC output

Sun Oct 22 16:29:04 EDT 2006

Nix writes:

 > Alas, the C standard says no :) of course, C code is primarily produced
 > for machines to read, and they prefer consistency. GCC's standard error
 > stream is parsed in sufficient detail for quotes to matter by perhaps
 > one or two programs, and what they do isn't terribly complex...

What the Emacs Lisp does is not terribly complex, true---but totally
irrelevant.  By that point, *it's too late*, the codecs (which live
just on the XEmacs side of the pipe, and pretty much have to if you
want any kind of efficiency) have already converted those bytes.

But it's a poor atom-blaster that won't point both ways.  You realize
that what *GCC* is doing internally is not very complex, and could
easily be delegated to a wrapper, so we wouldn't have to recommend
that the user change a *system global parameter* (ie, LANG) to suit an
application that rarely produces output directly for the user, but
rather normally is filtered through one or more wrappers anyway?

 > I don't get it.

That's my point.  If you *did* get it, I would call it a difference of
values, not "ill-advised". ;-)

 > If a program is looking at GCC's standard error stream, why would
 > it expect anything other than text (7-bit ASCII before this change,
 > UTF-8 afterwards)?

If "a program" is XEmacs, it could be a shell buffer where I'd been
looking at EUC-JP content in TeX error messages in my last make, or
Shift JIS I'd grepped out of an email.

 > There's no way we could avoid producing UTF-8 output on stderr in some
 > circumstances, even if the quotes were kept at `': printed Java
 > identifiers would have to be Unicode, for starters.

But compile.el is not going to try to parse the Java identifiers, just
spit them back.  Ie, that's not Unicode *produced* by gcc, that's
binary crap *rendered* from the data, just like the jTeX content.  If
the data in the code is something other than UTF-8 (eg, ISO-8859-1), I
don't see why GCC should give a fig about the value of LANG in the
environment.

 > I'll admit that I can't figure out why you would set LANG=en_BLAH.UTF-8
 > in your environment if your tools were *not* capable of handling UTF-8.
 > It *still* seems like a reasonable assumption to me.

As stated, the assumption is not violated.  My tools *are* capable of
handling UTF-8.  It is the inference that using UTF-8 is therefore
reliable that is wrong.

 > If your tools can't handle UTF-8, don't set your locale to a UTF-8
 > locale.

Uh, we are talking about the LANG variable.  It is *global*.  For
*most* of what I do, it makes sense to set that variable to *.UTF-8,
because *most* of my data (including a fair number file names) *is*
UTF-8, and (with the exception of XEmacs) *none* of my tools are smart
enough to DTRT without LANG set appropriately.

So what you're saying is that a user whose data is *mostly* UTF-8, and
whose "dumb" tools all handle UTF-8 and only UTF-8 should change LANG
to something else because he also uses a smart tool that not only can
DTRT with UTF-8, but UTF-16, EUC-JP, GB2312, KSC6501, and KOI-8 as
well as ASCII?