mc-alloc bug?

Mon Sep 10 23:42:55 EDT 2007

I've built a 21.5 from CVS of just before the cvs.xemacs.org crash, on
a 32-bit Pentium 4.  I ran it under valgrind and got some errors
before the frame is mapped.  Here's the first one:

==11577== Invalid write of size 1
==11577==    at 0x4006614: memset (mc_replace_strmem.c:490)
==11577==    by 0x832B7BD: mc_alloc_1 (string3.h:96)
==11577==    by 0x816BBC4: alloc_lrecord (alloc.c:582)
==11577==    by 0x8362844: make_opaque_ptr (opaque.c:167)
==11577==    by 0x81F876F: record_unwind_protect_restoring_int (eval.c:6047)
==11577==    by 0x81F87E8: internal_bind_int (eval.c:6076)
==11577==    by 0x82A4F15: begin_gc_forbidden (gc.c:1274)
==11577==    by 0x82A5D47: gc (gc.c:1939)
==11577==    by 0x82A5D96: gc_incremental (gc.c:1974)
==11577==    by 0x81FCA90: Feval (eval.c:3590)
==11577==    by 0x81FCBE5: Feval (eval.c:3663)
==11577==    by 0x832043D: readevalloop (lread.c:1467)
==11577==  Address 0x4646028 is 0 bytes after a block of size 2,494,464 alloc'd
==11577==    at 0x4004824: calloc (vg_replace_malloc.c:279)
==11577==    by 0x816D20C: xmalloc_and_zero (alloc.c:402)
==11577==    by 0x832B536: mc_alloc_1 (mc-alloc.c:1150)
==11577==    by 0x816BBC4: alloc_lrecord (alloc.c:582)
==11577==    by 0x816C7EF: Fcons (alloc.c:1273)
==11577==    by 0x8322059: read_list_conser (lread.c:2977)
==11577==    by 0x831CB0D: sequence_reader (lread.c:2899)
==11577==    by 0x831CBC9: read_list (lread.c:3024)
==11577==    by 0x831E1A0: read1 (lread.c:2471)
==11577==    by 0x831F7E5: read0 (lread.c:1661)
==11577==    by 0x83208FC: readevalloop (lread.c:1464)
==11577==    by 0x8324180: Fload_internal (lread.c:768)

This is followed by a complaint about accessing 1 byte after the same
block, so that memset manages to go 2 bytes after the end of the
allocated block.  I don't understand the mc-alloc.c code well enough
to judge yet, but here's a guess at what might be happening.

The error happens during startup, so we are allocating like mad but
not freeing anything.  Therefore, there is only one page on the free
list, and it is mostly allocated already; in fact, it has fewer than
sizeof(struct Lisp_Opaque) bytes left, which is more than
USED_LIST_MIN_OBJECT_SIZE but less than USED_LIST_UPPER_THRESHOLD.
The call to make_opaque_ptr -> alloc_lrecord -> mc_alloc_1 thus has
only the one page to consider.  The call to get_used_list_index at the
top of mc_alloc_1 returns 4, if I have counted correctly, so

  plh = USED_HEAP_PAGES (get_used_list_index (size));

set plh to &mc_allocator_globals.used_heap_pages[4].  Since the size
is small enough, a call to allocate_cell is then made.  Here's the
part I'm not so sure about.  None of the remaining code checks that a
sufficient number of bytes remain on that page.  Or do we always
allocate an entire page?  In any case, the memset a few lines down is
running off the end of the allocated block by 2 bytes, so something
like this scenario must be happening.  I'll try to track it down in
more detail when I have a little more free time.
-- 
Jerry James
http://loganjerry.googlepages.com/