>>>>> "RR" == Roman Richardson <Roman.Richardson@state.mn.us> writes:
RR> I'm hoping to put up a searchable archive site for my majordomo, but
RR> don't even know where to start. I know J< has one up for greatcircle,
RR> but is there any source I could grab to fool around with, or hints as
RR> to where I should start?
The software that I use to drive my archives is at
ftp.hpc.uh.edu:/pub/majordomo/tmli.tgz.
Major warning: this code is nasty. Gross. Anyone who knows any Perl will
probably run screaming at the incredibly stupid things I do in there. Yes,
this was my first real Perl program.
You need MHonArc to generate the archive pages, Glimpse to run the search
engine, Majordomo (archive2.pl, actually) to generate the mbox files, and
my code to glue it together. (Web searches will reveal sites for MHonArc
and Glimpse.)
The glue code is undocumented, but I'll explain a little. Put w3glimpse,
w3index, w3reindex, w3highlighter, and w3striphtml in your cgi-bin. Also
put a w3glimpse.conf there, looking like this:
#listname list title URLpath archivepath indexpath mboxpath
majordomo-workers Majordomo Workers /majordomo-workers/ /home/www/majordomo-workers/ /home/www/majordomo-workers/index /home/ftp/pub/majordomo/workers-archive
majordomo-users Majordomo Users /majordomo-users/ /home/www/majordomo-users/ /home/www/majordomo-users/index /home/ftp/pub/majordomo/users-archive
The fields should be tab separated.
archivepath should be a directory containing:
index.cgi as a link to w3index (your web server must support generated index
pages).
a "construction.gif" with a picture to be used when the archive is being
generated.
a "background.gif" if you want one. If you don't, modify mhonarc.rc
appropriately.
a directory "index" to hold the glimpse stuff; inside this should be a file
.glimpse_exclude:
XYX:sina:~www/majordomo-users/index> cat .glimpse_exclude
.mhonarc.db
background.gif
index.html
threads.html
and a file .glimpse_filters:
XYX:sina:~www/majordomo-users/index> cat .glimpse_filters
*.html /home/www/cgi-bin/w3striphtml <
a mhonarc.rc, containing the MHonArc configuration for the archives. The
one I use for majordomo-users is at the end of this message. I still use
MHonArc 1.x, so the new 2.0 format might be different.
Then run w3reindex and it should build the archives for everything you have
configured in w3glimpse.conf. This includes linking in the mbox files,
running MHonArc over them, and running Glimpse on the result. Run
w3reindex nightly from cron to do it automatically. Be sure to start with
a small archive first; it can take forever to run.
I'll be happy to answer any questions, but remember that you were warned
that this stuff was nasty. One day I'll rewrite it (and actually make the
line number links correct) but I still have a lot of work to do on
Majordomo 2.0.
BTW, you can get huge piles of extra speed and correct line number links by
getting rid of .glimpse_filters, but then glimpse indexes things in the
next and pervious message tags, which is a real pain since it gives you
three times as many hits as it should. You can get correct line number
links at the expense of huge piles of speed by twiddling a glimps
invocation line in w3index (to use .glimpse_filters when searching) but I
wouldn't recommend it.
- J<
mhonarc.rc:
<NODOC>
<SORT>
<MSGSEP>
^From .* \w{3} \w{3} [ \d]\d
</MSGSEP>
<TLEVELS>
5
</TLEVELS>
<IDXFNAME>
index.html
</IDXFNAME>
<TITLE>
Chronological Index
</TITLE>
<IDXPGBEGIN>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>$IDXTITLE$ ($OUTDIR$)</title>
</head>
<body background=background.gif>
<h1>$IDXTITLE$ ($OUTDIR$)</h1>
</IDXPGBEGIN>
<LISTBEGIN>
<a href="$TIDXFNAME$">[Thread Index]</a>
<a href="..">[Top]</a>
<hr>
<address>
Last update: $LOCALDATE$<br>
$NUMOFMSG$ messages in chronological order<br>
</address>
<p>
<table>
<tr><th><strong>Subject</strong><hr>
<th><em>From</em><hr>
<th># of followups<br><hr>
</LISTBEGIN>
<LITEMPLATE>
<tr>
<td><strong>$SUBJECT$</strong>
<td><em>$FROMNAME:26$</em>
<td>$NUMFOLUP$<br>
</LITEMPLATE>
<LISTEND>
</table>
<p>
<hr>
<a href="$TIDXFNAME$">[Thread Index]</a>
<a href="..">[Top]</a>
<p>
</LISTEND>
<TTITLE>
Thread Index
</TTITLE>
<TIDXPGBEGIN>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>$TIDXTITLE$ ($OUTDIR$)</title>
</head>
<body background=background.gif>
<h1>$TIDXTITLE$ ($OUTDIR$)</h1>
</TIDXPGBEGIN>
<THEAD>
<a href="$IDXFNAME$">[Chronological Index]</a>
<a href="..">[Top]</a>
<hr>
<address>
Last update: $LOCALDATE$<br>
$NUMOFMSG$ threaded messages<br>
</address>
<p>
</THEAD>
<TFOOT>
<p>
<hr>
<a href="$IDXFNAME$">[Chronological Index]</a>
<a href="..">[Top]</a>
<p>
</TFOOT>
<TOPLINKS>
<hr>
$PREVBUTTON$$NEXTBUTTON$
<a href="/majordomo-users/$OUTDIR$/$IDXFNAME$#$MSGNUM$">[Chronological]</a>
<a href="/majordomo-users/$OUTDIR$/$TIDXFNAME$#$MSGNUM$">[Thread]</a>
<a href="/majordomo-users">[Top]</a>
</TOPLINKS>
<BOTLINKS>
<hr>
<ul>
$PREVLINK$
$NEXTLINK$
<li>Index(es):
<ul>
<li><a href="/majordomo-users/$OUTDIR$/$IDXFNAME$#$MSGNUM$"><strong>Chronological</Strong></a></li>
<li><a href="/majordomo-users/$OUTDIR$/$TIDXFNAME$#$MSGNUM$"><strong>Thread</strong></a></li>
</BOTLINKS>
<EXCS override>
apparently
comments
content-length
content-transfer-encoding
content-type
errors-to
followup
forward
lines
message-id
mime-
nntp-
originator
path
precedence
priority
received
replied
reply-to
return-path
sender
status
via
x-
</EXCS>
<LABELSTYLES>
-default-
subject:strong
from:strong
to:strong
</LABELSTYLES>
<FIELDSTYLES>
-default-
subject:strong
from:strong
to:strong
keywords:em
newsgroups:strong
</FIELDSTYLES>
References:
|
|