[21.5R21.4] Implement replace-regexp-in-string with SUBEXP

Stephen J. Turnbull stephen at xemacs.org
Mon Sep 24 01:23:35 EDT 2007


21.5 RECOMMEND 21.4

This is reasonably complete, so I'm submitting it.  Followups to
xemacs-patches, please.  Reply-To set.

Stephen J. Turnbull writes:

 > [Vin -- I'm not sure what status of 21.4 is w.r.t these APIs, but
 > since it's an incompatibility with GNU, probably we ought to fix in
 > 21.4 too.  What do you want to do?]
 > 
 > Please review and test the attached patch, as a proof of concept.
 > The patch does compile and build, and the test Ville suggested passes.

Here's the patch again, with minimal tests.

Please review the tests at least to make sure that the behavior
implemented makes sense.

Index: tests/ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/tests/ChangeLog,v
retrieving revision 1.95
diff -u -U0 -r1.95 ChangeLog
--- tests/ChangeLog	21 Aug 2007 12:39:06 -0000	1.95
+++ tests/ChangeLog	23 Sep 2007 20:58:56 -0000
@@ -0,0 +1,4 @@
+2007-09-23  Stephen J. Turnbull  <stephen at xemacs.org>
+
+	* automated/regexp-tests.el (replace-regexp-in-string): New tests.
+
Index: lisp/ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/ChangeLog,v
retrieving revision 1.839
diff -u -U0 -r1.839 ChangeLog
--- lisp/ChangeLog	20 Sep 2007 21:18:33 -0000	1.839
+++ lisp/ChangeLog	24 Sep 2007 05:18:49 -0000
@@ -0,0 +1,4 @@
+2007-09-23  Stephen J. Turnbull  <stephen at xemacs.org>
+
+	* subr.el (replace-regexp-in-string): Handle SUBEXP arg properly.
+
Index: src/ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/ChangeLog,v
retrieving revision 1.1094
diff -u -U0 -r1.1094 ChangeLog
--- src/ChangeLog	4 Sep 2007 21:20:18 -0000	1.1094
+++ src/ChangeLog	23 Sep 2007 04:36:07 -0000
@@ -0,0 +1,5 @@
+2007-09-22  Stephen J. Turnbull  <stephen at xemacs.org>
+
+	* search.c (Freplace_match): Allow STRBUFFER to specify a
+	subexpression when the source is a string.
+
Index: man/ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/man/ChangeLog,v
retrieving revision 1.358
diff -u -U0 -r1.358 ChangeLog
--- man/ChangeLog	24 Aug 2007 22:26:42 -0000	1.358
+++ man/ChangeLog	23 Sep 2007 04:36:07 -0000
@@ -0,0 +1,6 @@
+2007-09-22  Stephen J. Turnbull  <stephen at xemacs.org>
+
+	* lispref/searching.texi (Replacing Match): Document the escapes
+	for changing case in `replace-match'.  Document the change to
+	STRBUFFER to permit subexpressions in string replacement.
+

Index: tests/automated/regexp-tests.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/tests/automated/regexp-tests.el,v
retrieving revision 1.15
diff -u -r1.15 regexp-tests.el
--- tests/automated/regexp-tests.el	2 Feb 2005 11:37:21 -0000	1.15
+++ tests/automated/regexp-tests.el	23 Sep 2007 20:59:00 -0000
@@ -459,3 +459,39 @@
   (Assert (null (match-string 2 text2)))
 )
 
+;; replace-regexp-in-string (regexp rep source
+;;                           fixedcase literal buf-or-subexp start)
+
+;; Currently we test the following cases:
+;; where `cbuf' and `bar-or-empty' are bound below.
+
+;; #### Tests for the various functional features (fixedcase, literal,start)
+;; should be added.
+
+(with-temp-buffer
+  (flet ((bar-or-empty (subexp) (if (string= subexp "foo") "bar" "")))
+    (let ((cbuf (current-buffer)))
+      (dolist (test-case
+               ;; REP           BUF-OR-SUBEXP   EXPECTED RESULT
+               `(("bar"         nil             " bar")
+                 ("bar"         ,cbuf           " bar")
+                 ("bar"         0               " bar")
+                 ("bar"         1               " bar foo")
+                 (bar-or-empty  nil             " ")
+                 (bar-or-empty  ,cbuf           " ")
+                 (bar-or-empty  0               " ")
+                 (bar-or-empty  1               " bar foo")))
+        (Assert
+         (string=
+          (nth 2 test-case)
+          (replace-regexp-in-string "\\(foo\\).*\\'" (nth 0 test-case)
+                                    " foo foo" nil nil (nth 1 test-case))))))))
+
+;; #### Tests for invalid arguments (esp. buf-or-subexp = 2) should be added.
+
+;; replace-match (REPLACEMENT &optional FIXEDCASE LITERAL STRING STRBUFFER)
+
+;; #### Write some tests!  Much functionality is implicitly tested above
+;; via `replace-regexp-in-string', but we should specifically test bogus
+;; combinations of STRING and STRBUFFER.
+
Index: lisp/subr.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/subr.el,v
retrieving revision 1.41
diff -u -r1.41 subr.el
--- lisp/subr.el	7 Aug 2007 23:08:52 -0000	1.41
+++ lisp/subr.el	24 Sep 2007 05:16:20 -0000
@@ -791,14 +791,23 @@
 
 Return a new string containing the replacements.
 
-Optional arguments FIXEDCASE, LITERAL and SUBEXP are like the
-arguments with the same names of function `replace-match'.  If START
-is non-nil, start replacements at that index in STRING.
+Optional arguments FIXEDCASE and LITERAL are like the arguments with
+the same names of function `replace-match'.  If START is non-nil,
+start replacements at that index in STRING.
+
+For compatibility with old XEmacs code and with recent GNU Emacs, the
+interpretation of SUBEXP is somewhat complicated.  If SUBEXP is a
+buffer, it is interpreted as the buffer which provides syntax tables
+and case tables for the match and replacement.  If it is not a buffer,
+the current buffer is used.  If SUBEXP is an integer, it is the index
+of the subexpression of REGEXP which is to be replaced.
 
 REP is either a string used as the NEWTEXT arg of `replace-match' or a
 function.  If it is a function it is applied to each match to generate
 the replacement passed to `replace-match'; the match-data at this
-point are such that match 0 is the function's argument.
+point are such that `(match-string SUBEXP STRING)' is the function's
+argument if SUBEXP is an integer \(otherwise the whole match is passed
+and replaced).
 
 To replace only the first match (if any), make REGEXP match up to \\'
 and replace a sub-expression, e.g.
@@ -817,6 +826,7 @@
   ;; might be reasonable to do so for long enough STRING.]
   (let ((l (length string))
 	(start (or start 0))
+	(expndx (if (integerp subexp) subexp 0))
 	matches str mb me)
     (save-match-data
       (while (and (< start l) (string-match regexp string start))
@@ -833,7 +843,7 @@
 	(setq matches
 	      (cons (replace-match (if (stringp rep)
 				       rep
-				     (funcall rep (match-string 0 str)))
+				     (funcall rep (match-string expndx str)))
 				   fixedcase literal str subexp)
 		    (cons (substring string start mb) ; unmatched prefix
 			  matches)))
Index: src/search.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/search.c,v
retrieving revision 1.46
diff -u -r1.46 search.c
--- src/search.c	24 Jan 2005 23:34:09 -0000	1.46
+++ src/search.c	23 Sep 2007 04:36:12 -0000
@@ -2364,15 +2364,22 @@
 
 DEFUN ("replace-match", Freplace_match, 1, 5, 0, /*
 Replace text matched by last search with REPLACEMENT.
-If second arg FIXEDCASE is non-nil, do not alter case of replacement text.
+Leaves point at end of replacement text.
+Optional boolean FIXEDCASE inhibits matching case of REPLACEMENT to source.
+Optional boolean LITERAL inhibits interpretation of escape sequences.
+Optional STRING provides the source text to replace.
+Optional STRBUFFER may be a buffer, providing match context, or an integer
+ specifying the subexpression to replace.
+
+If FIXEDCASE is non-nil, do not alter case of replacement text.
 Otherwise maybe capitalize the whole text, or maybe just word initials,
 based on the replaced text.
-If the replaced text has only capital letters
-and has at least one multiletter word, convert REPLACEMENT to all caps.
+If the replaced text has only capital letters and has at least one
+multiletter word, convert REPLACEMENT to all caps.
 If the replaced text has at least one word starting with a capital letter,
 then capitalize each word in REPLACEMENT.
 
-If third arg LITERAL is non-nil, insert REPLACEMENT literally.
+If LITERAL is non-nil, insert REPLACEMENT literally.
 Otherwise treat `\\' as special:
   `\\&' in REPLACEMENT means substitute original matched text.
   `\\N' means substitute what matched the Nth `\\(...\\)'.
@@ -2385,24 +2392,31 @@
   `\\E' means terminate the effect of any `\\U' or `\\L'.
   Case changes made with `\\u', `\\l', `\\U', and `\\L' override
   all other case changes that may be made in the replaced text.
-FIXEDCASE and LITERAL are optional arguments.
-Leaves point at end of replacement text.
 
-The optional fourth argument STRING can be a string to modify.
-In that case, this function creates and returns a new string
-which is made by replacing the part of STRING that was matched.
-When fourth argument is a string, fifth argument STRBUFFER specifies
-the buffer to be used for syntax-table and case-table lookup and
-defaults to the current buffer.  When fourth argument is not a string,
-the buffer that the match occurred in has automatically been remembered
-and you do not need to specify it.
-
-When fourth argument is nil, STRBUFFER specifies a subexpression of
-the match.  It says to replace just that subexpression instead of the
-whole match.  This is useful only after a regular expression search or
-match since only regular expressions have distinguished subexpressions.
+If non-nil, STRING is the source string, and a new string with the specified
+replacements is created and returned.  Otherwise the current buffer is the
+source text.
+
+If non-nil, STRBUFFER may be an integer, interpreted as the index of the
+subexpression to replace in the source text, or a buffer to provide the
+syntax table and case table.  If nil, then the \"subexpression\" is 0, i.e.,
+the whole match, and the current buffer provides the syntax and case tables.
+If STRING is nil, STRBUFFER must be nil or an integer.
+
+Specifying a subexpression is only useful after a regular expression match,
+since a fixed string search has no non-trivial subexpressions.
+
+It is not possible to specify both a buffer and a subexpression.  If that is
+desired, the idiom `(with-current-buffer BUFFER (replace-match ... INTEGER))'
+may be appropriate.
+
+If STRING is nil but the last thing matched (or searched) was a string, or
+STRING is a string but the last thing matched was a buffer, an
+`invalid-argument' error will be signaled.  (XEmacs does not check that the
+last thing searched is the source string, but it is not useful to use a
+different string as source.)
 
-If no match (including searches) has been conducted or the requested
+If no match (including searches) has been successful or the requested
 subexpression was not matched, an `args-out-of-range' error will be
 signaled.  (If no match has ever been conducted in this instance of
 XEmacs, an `invalid-operation' error will be signaled.  This is very
@@ -2430,31 +2444,59 @@
 
   CHECK_STRING (replacement);
 
+  /* Because GNU decided to be incompatible here, we support the following
+     baroque and bogus API for the STRING and STRBUFFER arguments:
+          types            interpretations
+     STRING   STRBUFFER   STRING   STRBUFFER
+     nil      nil         none     0 = index of subexpression to replace
+     nil      integer     none     index of subexpression to replace
+     nil      other       ***** error *****
+     string   nil         source   current buffer provides syntax table
+                                   subexpression = 0 (whole match)
+     string   buffer      source   buffer providing syntax table
+                                   subexpression = 0 (whole match)
+     string   integer     source   current buffer provides syntax table
+                                   subexpression = STRBUFFER
+     string   other       ***** error *****
+  */
+
+  /* Do STRBUFFER first; if STRING is nil, we'll overwrite BUF and BUFFER. */
+
+  /* If the match data were abstracted into a special "match data" type
+     instead of the typical half-assed "let the implementation be visible"
+     form it's in, we could extend it to include the last string matched
+     and the buffer used for that matching.  But of course we can't change
+     it as it is.
+  */
+  if (NILP (strbuffer) || BUFFERP (strbuffer))
+    {
+      buf = decode_buffer (strbuffer, 0);
+    }
+  else if (!NILP (strbuffer))
+    {
+      CHECK_INT (strbuffer);
+      sub = XINT (strbuffer);
+      if (sub < 0 || sub >= (int) search_regs.num_regs)
+	args_out_of_range (strbuffer, make_int (search_regs.num_regs));
+      buf = current_buffer;
+    }
+  else
+    invalid_argument ("STRBUFFER must be nil, a buffer, or an integer",
+		      strbuffer);
+  buffer = wrap_buffer (buf);
+
   if (! NILP (string))
     {
       CHECK_STRING (string);
       if (!EQ (last_thing_searched, Qt))
- invalid_argument ("last thing matched was not a string", Qunbound);
-      /* If the match data
-	 were abstracted into a special "match data" type instead
-	 of the typical half-assed "let the implementation be
-	 visible" form it's in, we could extend it to include
-	 the last string matched and the buffer used for that
-	 matching.  But of course we can't change it as it is. */
-      buf = decode_buffer (strbuffer, 0);
-      buffer = wrap_buffer (buf);
+	invalid_argument ("last thing matched was not a string", Qunbound);
     }
   else
     {
-      if (!NILP (strbuffer))
-	{
-	  CHECK_INT (strbuffer);
-	  sub = XINT (strbuffer);
-	  if (sub < 0 || sub >= (int) search_regs.num_regs)
-	    args_out_of_range (strbuffer, make_int (search_regs.num_regs));
-	}
       if (!BUFFERP (last_thing_searched))
- invalid_argument ("last thing matched was not a buffer", Qunbound);
+	invalid_argument ("last thing matched was not a buffer", Qunbound);
+      if (!NILP (strbuffer))
+	CHECK_INT (strbuffer);
       buffer = last_thing_searched;
       buf = XBUFFER (buffer);
     }
@@ -2557,8 +2599,8 @@
       Lisp_Object before, after;
 
       speccount = specpdl_depth ();
-      before = Fsubstring (string, Qzero, make_int (search_regs.start[0]));
-      after = Fsubstring (string, make_int (search_regs.end[0]), Qnil);
+      before = Fsubstring (string, Qzero, make_int (search_regs.start[sub]));
+      after = Fsubstring (string, make_int (search_regs.end[sub]), Qnil);
 
       /* Do case substitution into REPLACEMENT if desired.  */
       if (NILP (literal))
Index: man/lispref/searching.texi
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/man/lispref/searching.texi,v
retrieving revision 1.13
diff -u -r1.13 searching.texi
--- man/lispref/searching.texi	23 Feb 2005 15:33:38 -0000	1.13
+++ man/lispref/searching.texi	23 Sep 2007 04:36:12 -0000
@@ -1282,19 +1282,14 @@
 @var{replacement}.
 
 If you did the last search in a buffer, you should specify @code{nil}
-for @var{string}.  Then @code{replace-match} does the replacement by
-editing the buffer; it leaves point at the end of the replacement text,
-and returns @code{t}.
+for @var{string}.  (An error will be signaled if you don't.)  Then
+ at code{replace-match} does the replacement by editing the buffer; it
+leaves point at the end of the replacement text, and returns @code{t}.
 
 If you did the search in a string, pass the same string as @var{string}.
-Then @code{replace-match} does the replacement by constructing and
-returning a new string.
-
-If the fourth argument @var{string} is a string, fifth argument
- at var{strbuffer} specifies the buffer to be used for syntax-table and
-case-table lookup and defaults to the current buffer.  When @var{string}
-is not a string, the buffer that the match occurred in has automatically
-been remembered and you do not need to specify it.
+(An error will be signaled if you specify nil.)  Then
+ at code{replace-match} does the replacement by constructing and returning
+a new string.
 
 If @var{fixedcase} is non- at code{nil}, then the case of the replacement
 text is not changed; otherwise, the replacement text is converted to a
@@ -1317,20 +1312,64 @@
 
 @table @asis
 @item @samp{\&}
- at cindex @samp{&} in replacement
+ at cindex @samp{\&} in replacement
 @samp{\&} stands for the entire text being replaced.
 
 @item @samp{\@var{n}}
 @cindex @samp{\@var{n}} in replacement
+ at cindex @samp{\@var{digit}} in replacement
 @samp{\@var{n}}, where @var{n} is a digit, stands for the text that
 matched the @var{n}th subexpression in the original regexp.
 Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}.
 
 @item @samp{\\}
- at cindex @samp{\} in replacement
+ at cindex @samp{\\} in replacement
 @samp{\\} stands for a single @samp{\} in the replacement text.
+
+ at item @samp{\u}
+ at cindex @samp{\u} in replacement
+ at samp{\u} means upcase the next character.
+
+ at item @samp{\l}
+ at cindex @samp{\l} in replacement
+ at samp{\l} means downcase the next character.
+
+ at item @samp{\U}
+ at cindex @samp{\U} in replacement
+ at samp{\U} means begin upcasing all following characters.
+
+ at item @samp{\L}
+ at cindex @samp{\L} in replacement
+ at samp{\L} means begin downcasing all following characters.
+
+ at item @samp{\E}
+ at cindex @samp{\E} in replacement
+ at samp{\E} means terminate the effect of any @samp{\U} or @samp{\L}.
 @end table
+
+Case changes made with @samp{\u}, @samp{\l}, @samp{\U}, and @samp{\L}
+override all other case changes that may be made in the replaced text.
+
+The fifth argument @var{strbuffer} may be a buffer to be used for
+syntax-table and case-table lookup.  If @var{strbuffer} is not a buffer,
+the current buffer is used.  When @var{string} is not a string, the
+buffer that the match occurred in has automatically been remembered and
+you do not need to specify it.  @var{string} may also be an integer,
+specifying the index of the subexpression to match.  When @var{string}
+is not an integer, the ``subexpression'' is 0, @emph{i.e.}, the whole
+match.  An @code{invalid-argument} error will be signaled if you specify
+a buffer when @var{string} is nil, or specify a subexpression which was
+not matched.
+
+It is not possible to specify both a buffer and a subexpression, but the
+idiom
+ at example
+(with-current-buffer @var{buffer} (replace-match ... @var{integer}))
+ at end example
+may be used.
+
 @end defun
+
 
 @node Entire Match Data
 @subsection Accessing the Entire Match Data

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta at xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta



More information about the XEmacs-Beta mailing list