assert, ABORT, and friends

Mon Jun 19 19:09:23 EDT 2006

I'm trying to make some static checkers produce usable results on the
XEmacs code base.  By "usable", I mean, "not producing hundreds and
hundreds of false positives."  One of the problems I am facing is that
the checkers I am using are very, very good at finding code paths that
lead through assert_failed(), usually coming in via assert() or ABORT().
The function assert_failed() can return.  In normal circumstances, it
won't, but it can.  It starts like this:

  /* If we're already crashing, let's not crash again.  This might be
     critical to getting auto-saving working properly. */
  if (fatal_error_in_progress)
    return;

for example.  Our code base is riddled with code that calls assert() and
assumes that execution will not continue if the assertion fails, or
calls ABORT() and assumes that execution will not continue under any
circumstances.  Again, normally that is true, but it is not guaranteed.

The code in assert_failed() is trying to be paranoid.  It wants to (1)
make sure the user's files are auto-saved, and (2) that some kind of
"I'm dying!" message gets printed on the way out, with helpful
information such as a Lisp backtrace.  It wants to make sure that, if
(1) or (2) trigger another assertion failure, ABORT(), or fatal signal
that we don't get trapped in an infinite loop.

My opinion is that this is misdirected paranoia.  I think we would be
better served by having an assert_failed() that we know for certain will
not return, without unduly jeopardizing the user's data.  I can see a
couple of ways of accomplishing that.

We could have assert_failed() do a setjmp and, if assert_failed is
entered a second time, longjmp() back to the first invocation.  This
option makes me nervous.

We could have a second invocation of assert_failed() print some kind of
apologetic message ("Dave, my mind is going.  I can feel it.  I can feel
it.") and exit().

Since assert_failed() tries to auto-save the user's files first and
print second, a second invocation of assert_failed() means either that:
(1) the files were already auto-saved and we're recrashing while trying
to tell the user about it; or (2) the attempt to auto-save triggered
another failure.  In case (1), either approach above is fine.  In case
(2), I'm afraid that the user is just going to lose data no matter what
we do, so either approach above is fine.

For a little contrast, I just went through the Emacs code base to see
how they handle this situation.  Their assertion macro (xassert) just
calls the system abort() if the expression is false.  Their SIGABRT
handler unhandles SIGABRT before doing anything else.  That way, if a
second abort() happens, it uses the system default handler: dump core
and die.  I think this approach makes a lot of sense and is much better
than the error-prone way we are doing it.
-- 
Jerry James, Assistant Professor        james at xemacs.org
Computer Science Department             http://www.cs.usu.edu/~jerry/
Utah State University