21.5.28: replace-regexp-in-string with SUBEXP is broken

Sat Sep 22 16:02:10 EDT 2007

Ville Skyttä writes:

 > replace-regexp-in-string with SUBEXP in 21.5.28 is broken; for example the 
 > example given in the docstring fails.

The whole Emacs matching API really sucks. :-(

 > What should happen according to the example:
 > 
 >    (replace-regexp-in-string "\\(foo\\).*\\'" "bar" " foo foo" nil nil 1)
 >    " bar foo"
 > 
 > In my 21.5.28:
 > 
 >    (replace-regexp-in-string "\\(foo\\).*\\'" "bar" " foo foo" nil nil 1)
 >    " bar"
 > 
 > Perhaps when Ben brought this in from GNU Emacs, he may have failed to notice 
 > that our `replace-match' API differs from GNU's regarding strings vs buffers 
 > and subexp and this is why thing go south.  Thoughts?

It's not something quite that gross.  The RM code appears to correctly
replace only the substring of source strings it is passed, while the
RRIS code should "analyze" the source string correctly when providing
input to RM.  Something more subtle is happening.  I would guess that
the RM code is incorrectly returning only the matched subexpression.

More precisely, since strings are immutable, RM returns a string
concatenated from the head of the source, the replacement string, and
the tail of the source.  So presumably this code in RM (search.c,
l.2560) is the culprit:

      before = Fsubstring (string, Qzero, make_int (search_regs.start[0]));
      after = Fsubstring (string, make_int (search_regs.end[0]), Qnil);

but I don't know if changing [0] to [sub] is OK, yet.  You can try it,
I don't have time to test today.

There's a similar oddity in RRIS at l. 836:

				     (funcall rep (match-string 0 str)))

but this is only used in the rare case that rep is a function.  I
actually don't think this makes sense.  Consider this example:

(flet ((rep (s)
         (if (string= s "foo") "bar" s)))
  (replace-regexp-in-string "\\(foo\\).*\\'" 'rep " foo foo" nil nil 1)

I would expect that to result in " bar foo", but as written here (and
as specified in the docstring) that results in " foo foo" because
(match-string 0 str) is " foo foo" here.  We should look at GNU's
implementatation to see what they do.

For another thing, I discovered either `match-end' or `\'' appears
broken on strings:

(let ((s " foo foo"))
  ;; my-rris is `replace-regexp-in-string' without saving match data
  (list (my-rris "\\(foo\\).*\\'" "bar" s nil nil 1)
	(match-string 0 s)
	(match-string 1 s)
	(length s)
	(match-beginning 0)
	(match-end 0)
	(match-beginning 1)
	(match-end 1)))
=> (" bar" " foo fo" " fo" 8 0 7 0 3)

The first element is the bug that Ville reported but the rest is,
well, disturbing.  Anybody know what's going on here?