My XEmacs wish list

Tue Feb 12 14:29:45 EST 2008

This was touched on in another thread.  I've had a list of extremely
ambitious XEmacs projects in mind for some time.  Here they are.

Pie In The Sky XEmacs Projects I Would Like To See Happen Before I Die
(in no particular order)

1. Large file support
In order to edit files that are too large to fit into memory, we have
to have an infrastructure that lets us hold only pieces of a file in
memory.  The idea is to choose some block size and write a file
caching layer that deals in those blocks.  File blocks are loaded on
demand.  The cache associates a dirty bit with each block, so we know
when we have to save back to disk.  The cache acts just like every
other cache under the sun; when space runs low, it tries to drop clean
blocks first, saving back to disk only as a last resort.  The block
size will need some study.  The larger it is, the less overhead due to
cache management; the smaller it is, the less memory is occupied for
files that are only visited in a handful of places.

The association between where file blocks were loaded from and where
they should be saved to has to be managed carefully.  The saving
routines have to be completely rewritten, too; if you save a large
file that had something inserted into it, you essentially have to read
ahead and save behind for an appropriate distance from the insertion
point to the end of the file.  The implementations of markers and
extents are also affected, since they currently use Lisp integers to
identify file locations.  Also, the gap has to be considered.  How
does it fit into a block-based scheme?  It will be necessary, at the
very least, to support "incomplete" (or not full) blocks.  That
implies that we cannot find a given file offset by computing a block
number and an offset into that block.  We'll have to take incomplete
blocks between the start of the file and the target location into
account.  I don't know whether a sorted list of adjustment points or
some kind of index structure would be better.

Traditional font lock, which fontifies the entire buffer at load time,
won't work at all.  We'll have to commit to one (or more) of the lazy
fontifiers.

2. Layered views & transformers
At the expense of consuming more memory, we could do some nice things
with how data is presented, and consolidate a lot of disparate Lisp
functionality at the same time.  The idea is that, when data is first
pulled from a file, it is stored as just bytes.  We then try to
determine the significance of those bytes, using our existing
filename-based and possibly also libmagic-based means.  Keeping the
raw bytes, we now make a new buffer and associate a transformer in
each direction between the two buffers.  The idea is that you could
wind up with something like this:

a. A file named "webpages.tar.gz" which is determined to contain
gzipped content.
b. A buffer containing the raw bytes from webpages.tar.gz
   <- GZIP transformer
      UNGZIP transformer ->
c. A buffer containing the ungzipped bytes from webpages.tar.gz, which
is determined to contain tar content.
   <- TAR transformer
      UNTAR transformer ->
d. Multiple buffers, namely:
   i. A buffer containing the raw bytes from the table of contents of
the TAR file, which is determined to contain ISO8859-1 characters
      <- ISO8859-1 encoder
         ISO8859-1 decoder ->
      A buffer containing the characters from the table of contents of
the TAR file, which is parsed with a TAR file table of contents parser
   ii. One buffer apiece, containing raw bytes, for the files in the
TAR file.  Buffer "e" is one such:
e. A buffer containing the raw bytes for index.xhtml, the header of
which specifies a UTF-8 encoding
   <- UTF-8 encoder
      UTF-8 decoder ->
f. A buffer containing the characters for index.xhtml, which is parsed
with an XML parser

Now a user can switch between different views of the same file,
including a hexl-like byte view.  Edits made in one view propagate up
and down the tree of associated buffers (possibly lazily; you only
have to propagate when it impacts another visited view or on file
save).

In conjunction with project #1, each view would be a cache for the
view "below" it.  This is tricky, though, because many transformers
don't make it easy to associate offsets in the original and
transformed views.  Also, some transformers cannot start at an
arbitrary point in the lower-level buffer, (de)compressers and
(en-)(de-)cryptors, in particular.

3. Separation of data and presentation layers
This is the one that started the discussion, when Stephen wished for
it and inadvertently pushed my hot-topic button in the process.  The
data handling part of XEmacs should be a separate piece from the part
that draws pretty stuff on the screen.  Separating the data and
presentation layers has well-known benefits.  Here's another.  We
currently have some convoluted code to allow TTY and X11 front ends to
coexist, but we cannot do something similar for X11 and GNOME front
ends, for example.  Separating out these layers means we can have
entirely distinct front ends for each display device: TTY, X11, GNOME,
KDE, Windows, whatever the Mac users want, etc.  Multiple front ends
could connect to the back end simultaneously, because the back end is
just a data server.  We can make the front and back ends speak a much
more succinct protocol to each other than X11, for example, and handle
all the window manager traffic on the user's machine, in the front
end.  That would speed up the process of editing files on remote
machines.

We now have to worry about the security of the connection point.
However, properly managed, this is a good thing, as it makes XEmacs
into an instant collaborative tool.  You and your collaborators
connect to the same back end.  There's more to it, of course, but
that's another project.  In any case, we can talk about how to manage
incoming connections, but I suspect we will probably end up generating
some kind of key or token when the back end is started.  Front ends
that can present the correct key are allowed to connect.  We may even
want to make it possible to generate cryptographic capabilities, with
individual capabilities granting various degrees of access, or access
to particular sets or kinds of files/buffers, but that's yet another
project.

4. Extension language replacement
Yes, this bogeyman has reared his head again.  The first stumbling
block is *which* extension language to replace Elisp with.  There are
lots of candidates; the choice is not easy.  Personally, I favor
staying close to Lisp in order to minimize the porting effort.  Either
Scheme or Common Lisp would be a good way to go.  But maybe it doesn't
matter.  The C code is so full of Elisp dependencies that it would
have to undergo major surgery, no matter what the final extension
language.  And even changing to Scheme or Common Lisp won't save us
from having to rewrite all of the Elisp in both core and the packages.
 My big fear here is that we would be doomed to the fate of perlmacs.
Whatever happened to Guilized Emacs?  In any case, this is a dangerous
road to go down, because it means that almost everything has to be
rewritten, and it dooms any further efforts to sync with Emacs.  The
benefits are the potential for better performance (although it would
be easy to fail to realize that potential) and the possibility of
attracting more developers due to choosing a more widely known
language (Javamacs, anyone?).

5. Voice recognition support
This was one of the very first XEmacs projects I worked on.  I started
integrating IBM's ViaVoice with XEmacs way back when.  But then IBM
dropped support for the Linux version, and the project died.  I've had
a number of emails from people over the years wondering if I had any
plans to work on voice recognition support again.  The space of voice
recognition products has widened somewhat since I first worked on this
project, and there are several that run on Linux.  I need to take a
look at their APIs and see if there are any commonalities at all.  If
so, an interface that provides basic support for a variety of
products, together with a pluggable architecture for product-specific
support, would be great.

But first, I need to get a good (i.e., expensive) microphone and
soundproof my home office (4 kids, you know).  I should also look at
http://voicecode.iit.nrc.ca/ for UI ideas.

So there's my list.  Fire at will.
-- 
Jerry James
http://loganjerry.googlepages.com/