Great Circle Associates Majordomo-Workers
(October 1998)
 

Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: Re: Archives: implementation ideas
From: Jason L Tibbitts III <tibbs @ hpc . uh . edu>
Date: 01 Oct 1998 16:12:20 -0500
To: "Randall S. Winchester" <rsw @ Glue . umd . edu>
Cc: majordomo-workers @ greatcircle . com
In-reply-to: "Randall S. Winchester"'s message of "Thu, 1 Oct 1998 16:28:30 -0400 (EDT)"
References: <Pine.GSO.4.02.9810011432080.10605-100000@atlantis.csc.umd.edu>

>>>>> "RSW" == Randall S Winchester <rsw@Glue.umd.edu> writes:

RSW> I would like to see Mj2 Archives follow more of a directory
RSW> structure like the list information does.

I can't see using something like that, but it would be rather easy to do.
I've been thinking of how to abstract the archive backend sufficiently,
since someone mentioned wanting to store the whole thing in a database
(where the message itself is in the index as a large object).  I will wait
until the interface is somewhat stabilized before I go about this, though;
right now I'm changing it too much to worry about keeping multiple
interfaces in sync.

The API will be something like:

add a message to the archive
extract a named message form the archive
extract the information about a named message from the archive
get the names of the last N messages from the archive
delete a message from the archive
search an indexed field (from:, subject:) and return names/data of hit articles
search the raw articles and return the names/data of hit articles

That way the actual storage of articles is completely immaterial, and you
could write something them any however you want.

RSW> 2) It would very nice to have hooks to pipe a copy of the mail to a
RSW> program during the archive phase.

This is a massive hole if it is configurable, so it can't be.  I would
prefer to have a general hook mechanism and make this a regular hook.
There has been only light discussion about this previously; perhaps you can
suggest a general mechanism.  I'm just thinking of defining named
subroutines that sit in a file which is 'require'd at startup and called
with some standardized data if defined.  This is probably sufficient.

RSW> The seperate directories would allow a place to maintain .htaccess
RSW> files for private lists, html archive files, search engine files, or
RSW> homepages for each list where "info.html" files coule be kept.

Well, you can do that by putting each list's archive in a separate
directory, which you can do now.

RSW> We could provide an easy to understand web interface to the growing
RSW> lists of Mj2 features. (I really really like the features of Mj2,
RSW> however my user community is way too confused. I need to give them
RSW> something with a "help" button by every command, and hide syntax as
RSW> much as possible.)

I think MajorCool is the answer.  I'd prefer to try and use as much of it
as possible, since 1) we already have a volunteer (no pressure, Bill), 2)
it works well for Mj1 and 3) many (though not enough) people are already
used to it.

But on the issue of web archiving, I ultimately would like something that
works like the following:

  Generates a pretty HTML index from archive data, with incremental
    updates.
  Generates, _on demand_ HTML from the raw archive.  This means that the
    HTML files can be expired, and I no longer have to blow inodes and disk
    space for individual HTML messages as well as mbox files.
  Provides a reasonable search interface.

I am considering borrowing pieces of MHonArc to make this happen, but it's
far enough down the road that I haven't talked to Earl Hood about it.  If
something happens with HyperMail this may all be changed.  Doing it with
pieces of MHonArc would actually not be all that difficult; the MHonArc
config stuff could be embedded in config variables.

 - J<


Follow-Ups:
References:
Indexed By Date Previous: Archives: implementation ideas
From: "Randall S. Winchester" <rsw@Glue.umd.edu>
Next: Re: Archives: implementation ideas
From: "Randall S. Winchester" <rsw@Glue.umd.edu>
Indexed By Thread Previous: Archives: implementation ideas
From: "Randall S. Winchester" <rsw@Glue.umd.edu>
Next: Re: Archives: implementation ideas
From: "Randall S. Winchester" <rsw@Glue.umd.edu>

Google
 
Search Internet Search www.greatcircle.com