quail: TeX input method: change UTF8 to tex and back: solution & problems

Uwe Brauer oub at mat.ucm.es
Tue Jan 8 09:32:23 EST 2008


>>>>> "Stephen" == Stephen J Turnbull <stephen at xemacs.org> writes:

   > Aidan Kehoe writes:

   >> What you need to do is rewrite the code to use
   >> #'posix-search-forward, which guarantees that it will return the
   >> longest match (that is, it will always try to match "\\infty" if
   >> possible, and only then look for "\in".)

   > I don't think so: this code is almost surely trying each car in
   > order.  Something like:

   > (progn
   >   (search-forward (regexp-opt (mapcar #'car replacement-alist)))
   >   (goto-char (match-begin 0))
   >   (when (looking-at target-string)
   >     (replace-match replacement-string)))

As I said in my mail to Adian: that is the main part of the code
(seems less sophisticated than yours, though)


(defun utf8symbol-translate-conventions (trans-tab)
  "Use the translation table argument to translate the current buffer."
  (save-excursion
    (let ((beg (point-min-marker))    ; see the `(elisp)Narrowing' Info node
	  (end (point-max-marker)))
      (unwind-protect
	  (progn
	    (widen)
	    (goto-char (point-min))
	    (let ((buffer-read-only nil) ; (inhibit-read-only t)?
		  (case-fold-search nil))
	      (while trans-tab
		(save-excursion
		  (let ((trans-this (car trans-tab)))
		    (while (search-forward (car trans-this) nil t)
;;           (regexp-opt trans-this)		;NEW
		      (replace-match (car (cdr trans-this)) t t)))
		  (setq trans-tab (cdr trans-tab))))))
	(narrow-to-region beg end)))))
together with the call


(defun fix-tex2utf8symbol ()
  "Replace SGML entity references with ISO 8859-1 (aka Latin-1) characters."
  (interactive)
;  (if (member major-mode utf8symbol-modes-list)
      (let ((buffer-modified-p (buffer-modified-p)))
	  (unwind-protect
	      (utf8symbol-translate-conventions tex2utf8symbol-trans-tab)
	    (set-buffer-modified-p buffer-modified-p))))



   > Note to Uwe: one problem you're facing here is that SGML entities
   > have a fairly reliable terminator character, the semicolon.  TeX
   > does not, so you might try replacing `target-string' in the logic
   > above with `(concat target-string "\\>")'.


I even thought of adding the each texstring \; and then kill the char
backwards, but that looks highly inefficient.

   >> You might also want to look into the #'regexp-opt function, which,
   >> given a list of strings,

   > regex-opt won't work here, it doesn't know anything about
   > internal grouping, so there's no way to trigger the replacement
   > of "\in" rather than "\infty".  In fact, I doubt that Emacs
   > regexp groups can simultaneously express all the relevant string
   > matches for
   > #r"\in\(t\|fty\)".


Uwe 



More information about the XEmacs-Beta mailing list