[Patch] 21.5.28: replace-regexp-in-string with SUBEXP is broken

Stephen J. Turnbull stephen at xemacs.org
Sun Sep 23 02:45:20 EDT 2007


[Vin -- I'm not sure what status of 21.4 is w.r.t these APIs, but
since it's an incompatibility with GNU, probably we ought to fix in
21.4 too.  What do you want to do?]

Please review and test the attached patch, as a proof of concept.
The patch does compile and build, and the test Ville suggested passes.

It's not ready for prime time, because the case where REP is a
function in replace-regexp-in-string needs to be figured out and
perhaps fixed, and a bunch of tests for these functions should be
written.  The test Ville proposed isn't in the patch, sorry, just
forgot but it's time for bed.

Also, I've updated the Lispref with documentation of case-munging
escapes in the replacement string.  This should be committed as a
separate patch in any case.  Remind me....

Stephen J. Turnbull writes:

 > I would guess that the RM code is incorrectly returning only the
 > matched subexpression.

As Ville pointed out in a reply to Mike Kupfer, it looks like the
`replace-match' API is not designed to replace a subexpression in a
string.  However, I don't see why not, and the attached patch
implements the more flexible API.  It's not quite orthogonal since
when the replacement source is a string you might want to specify the
syntax/case context and a subexpression, and you can't since the same
argument is used for both.  However, in almost all cases the idiom

  (with-current-buffer buf
    (replace-match "replace" nil nil "source string" 1))

should do the trick.

 > There's a similar oddity in RRIS at l. 836:
 > 
 > 				     (funcall rep (match-string 0 str)))
 > 
 > but this is only used in the rare case that rep is a function.  I
 > actually don't think this makes sense.  Consider this example:
 > 
 > (flet ((rep (s)
 >          (if (string= s "foo") "bar" s)))
 >   (replace-regexp-in-string "\\(foo\\).*\\'" 'rep " foo foo" nil nil 1)
 > 
 > I would expect that to result in " bar foo", but as written here (and
 > as specified in the docstring) that results in " foo foo" because
 > (match-string 0 str) is " foo foo" here.  We should look at GNU's
 > implementatation to see what they do.

Work on this is pending.

 > For another thing, I discovered either `match-end' or `\'' appears
 > broken on strings:

I've done nothing on this yet; any takers?

At long last, here's the proposed patch:

Index: src/ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/ChangeLog,v
retrieving revision 1.1094
diff -u -U0 -r1.1094 ChangeLog
--- src/ChangeLog	4 Sep 2007 21:20:18 -0000	1.1094
+++ src/ChangeLog	23 Sep 2007 04:36:07 -0000
@@ -0,0 +1,5 @@
+2007-09-22  Stephen J. Turnbull  <stephen at xemacs.org>
+
+	* search.c (Freplace_match): Allow STRBUFFER to specify a
+	subexpression when the source is a string.
+
Index: man/ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/man/ChangeLog,v
retrieving revision 1.358
diff -u -U0 -r1.358 ChangeLog
--- man/ChangeLog	24 Aug 2007 22:26:42 -0000	1.358
+++ man/ChangeLog	23 Sep 2007 04:36:07 -0000
@@ -0,0 +1,6 @@
+2007-09-22  Stephen J. Turnbull  <stephen at xemacs.org>
+
+	* lispref/searching.texi (Replacing Match): Document the escapes
+	for changing case in `replace-match'.  Document the change to
+	STRBUFFER to permit subexpressions in string replacement.
+

Index: src/search.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/search.c,v
retrieving revision 1.46
diff -u -r1.46 search.c
--- src/search.c	24 Jan 2005 23:34:09 -0000	1.46
+++ src/search.c	23 Sep 2007 04:36:12 -0000
@@ -2364,15 +2364,22 @@
 
 DEFUN ("replace-match", Freplace_match, 1, 5, 0, /*
 Replace text matched by last search with REPLACEMENT.
-If second arg FIXEDCASE is non-nil, do not alter case of replacement text.
+Leaves point at end of replacement text.
+Optional boolean FIXEDCASE inhibits matching case of REPLACEMENT to source.
+Optional boolean LITERAL inhibits interpretation of escape sequences.
+Optional STRING provides the source text to replace.
+Optional STRBUFFER may be a buffer, providing match context, or an integer
+ specifying the subexpression to replace.
+
+If FIXEDCASE is non-nil, do not alter case of replacement text.
 Otherwise maybe capitalize the whole text, or maybe just word initials,
 based on the replaced text.
-If the replaced text has only capital letters
-and has at least one multiletter word, convert REPLACEMENT to all caps.
+If the replaced text has only capital letters and has at least one
+multiletter word, convert REPLACEMENT to all caps.
 If the replaced text has at least one word starting with a capital letter,
 then capitalize each word in REPLACEMENT.
 
-If third arg LITERAL is non-nil, insert REPLACEMENT literally.
+If LITERAL is non-nil, insert REPLACEMENT literally.
 Otherwise treat `\\' as special:
   `\\&' in REPLACEMENT means substitute original matched text.
   `\\N' means substitute what matched the Nth `\\(...\\)'.
@@ -2385,24 +2392,31 @@
   `\\E' means terminate the effect of any `\\U' or `\\L'.
   Case changes made with `\\u', `\\l', `\\U', and `\\L' override
   all other case changes that may be made in the replaced text.
-FIXEDCASE and LITERAL are optional arguments.
-Leaves point at end of replacement text.
 
-The optional fourth argument STRING can be a string to modify.
-In that case, this function creates and returns a new string
-which is made by replacing the part of STRING that was matched.
-When fourth argument is a string, fifth argument STRBUFFER specifies
-the buffer to be used for syntax-table and case-table lookup and
-defaults to the current buffer.  When fourth argument is not a string,
-the buffer that the match occurred in has automatically been remembered
-and you do not need to specify it.
-
-When fourth argument is nil, STRBUFFER specifies a subexpression of
-the match.  It says to replace just that subexpression instead of the
-whole match.  This is useful only after a regular expression search or
-match since only regular expressions have distinguished subexpressions.
+If non-nil, STRING is the source string, and a new string with the specified
+replacements is created and returned.  Otherwise the current buffer is the
+source text.
+
+If non-nil, STRBUFFER may be an integer, interpreted as the index of the
+subexpression to replace in the source text, or a buffer to provide the
+syntax table and case table.  If nil, then the \"subexpression\" is 0, i.e.,
+the whole match, and the current buffer provides the syntax and case tables.
+If STRING is nil, STRBUFFER must be nil or an integer.
+
+Specifying a subexpression is only useful after a regular expression match,
+since a fixed string search has no non-trivial subexpressions.
+
+It is not possible to specify both a buffer and a subexpression.  If that is
+desired, the idiom `(with-current-buffer BUFFER (replace-match ... INTEGER))'
+may be appropriate.
+
+If STRING is nil but the last thing matched (or searched) was a string, or
+STRING is a string but the last thing matched was a buffer, an
+`invalid-argument' error will be signaled.  (XEmacs does not check that the
+last thing searched is the source string, but it is not useful to use a
+different string as source.)
 
-If no match (including searches) has been conducted or the requested
+If no match (including searches) has been successful or the requested
 subexpression was not matched, an `args-out-of-range' error will be
 signaled.  (If no match has ever been conducted in this instance of
 XEmacs, an `invalid-operation' error will be signaled.  This is very
@@ -2430,31 +2444,59 @@
 
   CHECK_STRING (replacement);
 
+  /* Because GNU decided to be incompatible here, we support the following
+     baroque and bogus API for the STRING and STRBUFFER arguments:
+          types            interpretations
+     STRING   STRBUFFER   STRING   STRBUFFER
+     nil      nil         none     0 = index of subexpression to replace
+     nil      integer     none     index of subexpression to replace
+     nil      other       ***** error *****
+     string   nil         source   current buffer provides syntax table
+                                   subexpression = 0 (whole match)
+     string   buffer      source   buffer providing syntax table
+                                   subexpression = 0 (whole match)
+     string   integer     source   current buffer provides syntax table
+                                   subexpression = STRBUFFER
+     string   other       ***** error *****
+  */
+
+  /* Do STRBUFFER first; if STRING is nil, we'll overwrite BUF and BUFFER. */
+
+  /* If the match data were abstracted into a special "match data" type
+     instead of the typical half-assed "let the implementation be visible"
+     form it's in, we could extend it to include the last string matched
+     and the buffer used for that matching.  But of course we can't change
+     it as it is.
+  */
+  if (NILP (strbuffer) || BUFFERP (strbuffer))
+    {
+      buf = decode_buffer (strbuffer, 0);
+    }
+  else if (!NILP (strbuffer))
+    {
+      CHECK_INT (strbuffer);
+      sub = XINT (strbuffer);
+      if (sub < 0 || sub >= (int) search_regs.num_regs)
+	args_out_of_range (strbuffer, make_int (search_regs.num_regs));
+      buf = current_buffer;
+    }
+  else
+    invalid_argument ("STRBUFFER must be nil, a buffer, or an integer",
+		      strbuffer);
+  buffer = wrap_buffer (buf);
+
   if (! NILP (string))
     {
       CHECK_STRING (string);
       if (!EQ (last_thing_searched, Qt))
- invalid_argument ("last thing matched was not a string", Qunbound);
-      /* If the match data
-	 were abstracted into a special "match data" type instead
-	 of the typical half-assed "let the implementation be
-	 visible" form it's in, we could extend it to include
-	 the last string matched and the buffer used for that
-	 matching.  But of course we can't change it as it is. */
-      buf = decode_buffer (strbuffer, 0);
-      buffer = wrap_buffer (buf);
+	invalid_argument ("last thing matched was not a string", Qunbound);
     }
   else
     {
-      if (!NILP (strbuffer))
-	{
-	  CHECK_INT (strbuffer);
-	  sub = XINT (strbuffer);
-	  if (sub < 0 || sub >= (int) search_regs.num_regs)
-	    args_out_of_range (strbuffer, make_int (search_regs.num_regs));
-	}
       if (!BUFFERP (last_thing_searched))
- invalid_argument ("last thing matched was not a buffer", Qunbound);
+	invalid_argument ("last thing matched was not a buffer", Qunbound);
+      if (!NILP (strbuffer))
+	CHECK_INT (strbuffer);
       buffer = last_thing_searched;
       buf = XBUFFER (buffer);
     }
@@ -2557,8 +2599,8 @@
       Lisp_Object before, after;
 
       speccount = specpdl_depth ();
-      before = Fsubstring (string, Qzero, make_int (search_regs.start[0]));
-      after = Fsubstring (string, make_int (search_regs.end[0]), Qnil);
+      before = Fsubstring (string, Qzero, make_int (search_regs.start[sub]));
+      after = Fsubstring (string, make_int (search_regs.end[sub]), Qnil);
 
       /* Do case substitution into REPLACEMENT if desired.  */
       if (NILP (literal))
Index: man/lispref/searching.texi
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/man/lispref/searching.texi,v
retrieving revision 1.13
diff -u -r1.13 searching.texi
--- man/lispref/searching.texi	23 Feb 2005 15:33:38 -0000	1.13
+++ man/lispref/searching.texi	23 Sep 2007 04:36:12 -0000
@@ -1282,19 +1282,14 @@
 @var{replacement}.
 
 If you did the last search in a buffer, you should specify @code{nil}
-for @var{string}.  Then @code{replace-match} does the replacement by
-editing the buffer; it leaves point at the end of the replacement text,
-and returns @code{t}.
+for @var{string}.  (An error will be signaled if you don't.)  Then
+ at code{replace-match} does the replacement by editing the buffer; it
+leaves point at the end of the replacement text, and returns @code{t}.
 
 If you did the search in a string, pass the same string as @var{string}.
-Then @code{replace-match} does the replacement by constructing and
-returning a new string.
-
-If the fourth argument @var{string} is a string, fifth argument
- at var{strbuffer} specifies the buffer to be used for syntax-table and
-case-table lookup and defaults to the current buffer.  When @var{string}
-is not a string, the buffer that the match occurred in has automatically
-been remembered and you do not need to specify it.
+(An error will be signaled if you specify nil.)  Then
+ at code{replace-match} does the replacement by constructing and returning
+a new string.
 
 If @var{fixedcase} is non- at code{nil}, then the case of the replacement
 text is not changed; otherwise, the replacement text is converted to a
@@ -1317,20 +1312,64 @@
 
 @table @asis
 @item @samp{\&}
- at cindex @samp{&} in replacement
+ at cindex @samp{\&} in replacement
 @samp{\&} stands for the entire text being replaced.
 
 @item @samp{\@var{n}}
 @cindex @samp{\@var{n}} in replacement
+ at cindex @samp{\@var{digit}} in replacement
 @samp{\@var{n}}, where @var{n} is a digit, stands for the text that
 matched the @var{n}th subexpression in the original regexp.
 Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}.
 
 @item @samp{\\}
- at cindex @samp{\} in replacement
+ at cindex @samp{\\} in replacement
 @samp{\\} stands for a single @samp{\} in the replacement text.
+
+ at item @samp{\u}
+ at cindex @samp{\u} in replacement
+ at samp{\u} means upcase the next character.
+
+ at item @samp{\l}
+ at cindex @samp{\l} in replacement
+ at samp{\l} means downcase the next character.
+
+ at item @samp{\U}
+ at cindex @samp{\U} in replacement
+ at samp{\U} means begin upcasing all following characters.
+
+ at item @samp{\L}
+ at cindex @samp{\L} in replacement
+ at samp{\L} means begin downcasing all following characters.
+
+ at item @samp{\E}
+ at cindex @samp{\E} in replacement
+ at samp{\E} means terminate the effect of any @samp{\U} or @samp{\L}.
 @end table
+
+Case changes made with @samp{\u}, @samp{\l}, @samp{\U}, and @samp{\L}
+override all other case changes that may be made in the replaced text.
+
+The fifth argument @var{strbuffer} may be a buffer to be used for
+syntax-table and case-table lookup.  If @var{strbuffer} is not a buffer,
+the current buffer is used.  When @var{string} is not a string, the
+buffer that the match occurred in has automatically been remembered and
+you do not need to specify it.  @var{string} may also be an integer,
+specifying the index of the subexpression to match.  When @var{string}
+is not an integer, the ``subexpression'' is 0, @emph{i.e.}, the whole
+match.  An @code{invalid-argument} error will be signaled if you specify
+a buffer when @var{string} is nil, or specify a subexpression which was
+not matched.
+
+It is not possible to specify both a buffer and a subexpression, but the
+idiom
+ at example
+(with-current-buffer @var{buffer} (replace-match ... @var{integer}))
+ at end example
+may be used.
+
 @end defun
+
 
 @node Entire Match Data
 @subsection Accessing the Entire Match Data



More information about the XEmacs-Beta mailing list