Great Circle Associates Majordomo-Workers
(July 1997)
 

Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: Re: archive/search...
From: Jason L Tibbitts III <tibbs @ hpc . uh . edu>
Date: 03 Jul 1997 14:08:29 -0500
To: majordomo-workers @ greatcircle . com, Walter Johanson <lexbahn @ ibm . net>
In-reply-to: Walter Johanson's message of Thu, 03 Jul 1997 08:54:19 -0700
References: <33BBCB2B.13F50E84@ibm.net>

>>>>> "WJ" == Walter Johanson <lexbahn@ibm.net> writes:

WJ> Have you a time-line on the inclusion of archive and search (archive)
WJ> features to be included with majordomo?

No.  The fact that I don't give timelines at all notwithstanding, right now
there really isn't really a plan to incorporate archiving features more
complex than those in current versions.

If you have ideas, feel free to contribute them.  I have some of my own
which involve using Glimpse which I will probably implement eventually, but
this will come at a later date.

The software which implements the web archives of this list
(http://www.hpc.uh.edu/majordomo-workers) is all freely available and
really doesn't need to be a part of Majordomo.

For those interested, the idea that I've been mulling over wrt. archiving
is this:

Store archives in standard mbox format, but also store an index of line and
byte offsets of messages within the file along with extra information as in
an NNTP .overview file.  I will probably eventually extend Mail::Folder to
handle this format.  The idea is to be able to quickly extract a single
message with a seek and a sysread.  I need to implement this in order to
make digests work the way I want, so I'll be doing this in the short term.

Searches will be done using Glimpse or some other free text indexer, taking
the line offsets from the search, using the index to turn them into message
numbers and building a short index (subject, date, author, matching text
lines) or a complete digest of those messages as the search response.

For web-based archiving, the various indexes would be maintained
continuously but the messages themselves would be generated dynamically
from the mbox archives, possibly using some caching.  The idea is to be
able to keep around a single gzipped archive per week/month/whatever and
have everything else except the thread/chronological/author indexes
generated from it.

I know that someone here has been working on a web-based archive presenter
and I hope that we can work together on this to make things happen
relatively soon.

 - J<


Follow-Ups:
References:
Indexed By Date Previous: Re: List accounting scripts?
From: Jason L Tibbitts III <tibbs@hpc.uh.edu>
Next: Re: List accounting scripts?
From: Kevin Kelleher <kevink@MIT.EDU>
Indexed By Thread Previous: archive/search...
From: Walter Johanson <lexbahn@ibm.net>
Next: Re: archive/search...
From: Manar Hussain <manar@ivision.co.uk>

Google
 
Search Internet Search www.greatcircle.com