Great Circle Associates Majordomo-Users
(June 1997)
 

Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: Re: WWW & Archives help
From: Jason L Tibbitts III <tibbs @ hpc . uh . edu>
Date: 05 Jun 1997 12:18:31 -0500
To: Roman Richardson <Roman . Richardson @ state . mn . us>
Cc: majordomo-users @ GreatCircle . COM
In-reply-to: Roman Richardson's message of Thu, 05 Jun 1997 10:25:04 -0500
References: <2.2.32.19970605152504.006bef2c@mail.state.mn.us>

>>>>> "RR" == Roman Richardson <Roman.Richardson@state.mn.us> writes:

RR> I'm hoping to put up a searchable archive site for my majordomo, but
RR> don't even know where to start.  I know J< has one up for greatcircle,
RR> but is there any source I could grab to fool around with, or hints as
RR> to where I should start?

The software that I use to drive my archives is at
ftp.hpc.uh.edu:/pub/majordomo/tmli.tgz.

Major warning: this code is nasty.  Gross.  Anyone who knows any Perl will
probably run screaming at the incredibly stupid things I do in there.  Yes,
this was my first real Perl program.

You need MHonArc to generate the archive pages, Glimpse to run the search
engine, Majordomo (archive2.pl, actually) to generate the mbox files, and
my code to glue it together.  (Web searches will reveal sites for MHonArc
and Glimpse.)

The glue code is undocumented, but I'll explain a little.  Put w3glimpse,
w3index, w3reindex, w3highlighter, and w3striphtml in your cgi-bin.  Also
put a w3glimpse.conf there, looking like this:

#listname         list title        URLpath             archivepath                     indexpath                         mboxpath
majordomo-workers Majordomo Workers /majordomo-workers/ /home/www/majordomo-workers/    /home/www/majordomo-workers/index /home/ftp/pub/majordomo/workers-archive
majordomo-users   Majordomo Users   /majordomo-users/   /home/www/majordomo-users/      /home/www/majordomo-users/index   /home/ftp/pub/majordomo/users-archive

The fields should be tab separated.

archivepath should be a directory containing:

index.cgi as a link to w3index (your web server must support generated index
  pages).
a "construction.gif" with a picture to be used when the archive is being
  generated.
a "background.gif" if you want one.  If you don't, modify mhonarc.rc
  appropriately.
a directory "index" to hold the glimpse stuff; inside this should be a file
  .glimpse_exclude:

XYX:sina:~www/majordomo-users/index> cat .glimpse_exclude
.mhonarc.db
background.gif
index.html
threads.html

  and a file .glimpse_filters:

XYX:sina:~www/majordomo-users/index> cat .glimpse_filters 
*.html  /home/www/cgi-bin/w3striphtml <

a mhonarc.rc, containing the MHonArc configuration for the archives.  The
  one I use for majordomo-users is at the end of this message.  I still use
  MHonArc 1.x, so the new 2.0 format might be different.

Then run w3reindex and it should build the archives for everything you have
configured in w3glimpse.conf.  This includes linking in the mbox files,
running MHonArc over them, and running Glimpse on the result.  Run
w3reindex nightly from cron to do it automatically.  Be sure to start with
a small archive first; it can take forever to run.

I'll be happy to answer any questions, but remember that you were warned
that this stuff was nasty.  One day I'll rewrite it (and actually make the
line number links correct) but I still have a lot of work to do on
Majordomo 2.0.

BTW, you can get huge piles of extra speed and correct line number links by
getting rid of .glimpse_filters, but then glimpse indexes things in the
next and pervious message tags, which is a real pain since it gives you
three times as many hits as it should.  You can get correct line number
links at the expense of huge piles of speed by twiddling a glimps
invocation line in w3index (to use .glimpse_filters when searching) but I
wouldn't recommend it.

 - J<

mhonarc.rc:

<NODOC>

<SORT>

<MSGSEP>
^From .*  \w{3} \w{3} [ \d]\d
</MSGSEP>

<TLEVELS>
5
</TLEVELS>

<IDXFNAME>
index.html
</IDXFNAME>

<TITLE>
Chronological Index
</TITLE>

<IDXPGBEGIN>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>$IDXTITLE$ ($OUTDIR$)</title>
</head>
<body background=background.gif>
<h1>$IDXTITLE$ ($OUTDIR$)</h1>
</IDXPGBEGIN>

<LISTBEGIN>
<a href="$TIDXFNAME$">[Thread Index]</a>
<a href="..">[Top]</a>
<hr>
<address>
Last update: $LOCALDATE$<br>
$NUMOFMSG$ messages in chronological order<br>
</address>
<p>
<table>
<tr><th><strong>Subject</strong><hr>
    <th><em>From</em><hr>
    <th># of followups<br><hr>
</LISTBEGIN>

<LITEMPLATE>
<tr>
<td><strong>$SUBJECT$</strong>
<td><em>$FROMNAME:26$</em>
<td>$NUMFOLUP$<br>
</LITEMPLATE>

<LISTEND>
</table>
<p>
<hr>
<a href="$TIDXFNAME$">[Thread Index]</a>
<a href="..">[Top]</a>
<p>
</LISTEND>

<TTITLE>
Thread Index
</TTITLE>

<TIDXPGBEGIN>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>$TIDXTITLE$ ($OUTDIR$)</title>
</head>
<body background=background.gif>
<h1>$TIDXTITLE$ ($OUTDIR$)</h1>
</TIDXPGBEGIN>

<THEAD>
<a href="$IDXFNAME$">[Chronological Index]</a>
<a href="..">[Top]</a>
<hr>
<address>
Last update: $LOCALDATE$<br>
$NUMOFMSG$ threaded messages<br>
</address>
<p>
</THEAD>

<TFOOT>
<p>
<hr>
<a href="$IDXFNAME$">[Chronological Index]</a>
<a href="..">[Top]</a>
<p>
</TFOOT>

<TOPLINKS>
<hr>
$PREVBUTTON$$NEXTBUTTON$
<a href="/majordomo-users/$OUTDIR$/$IDXFNAME$#$MSGNUM$">[Chronological]</a>
<a href="/majordomo-users/$OUTDIR$/$TIDXFNAME$#$MSGNUM$">[Thread]</a>
<a href="/majordomo-users">[Top]</a>
</TOPLINKS>

<BOTLINKS>
<hr>
<ul>
$PREVLINK$
$NEXTLINK$
<li>Index(es):
<ul>
<li><a href="/majordomo-users/$OUTDIR$/$IDXFNAME$#$MSGNUM$"><strong>Chronological</Strong></a></li>
<li><a href="/majordomo-users/$OUTDIR$/$TIDXFNAME$#$MSGNUM$"><strong>Thread</strong></a></li>
</BOTLINKS>


<EXCS override>
apparently
comments
content-length
content-transfer-encoding
content-type
errors-to 
followup
forward 
lines 
message-id
mime- 
nntp- 
originator 
path 
precedence 
priority
received 
replied 
reply-to
return-path 
sender
status 
via 
x- 
</EXCS>

<LABELSTYLES>
-default-
subject:strong
from:strong
to:strong
</LABELSTYLES>

<FIELDSTYLES>
-default-
subject:strong
from:strong
to:strong
keywords:em
newsgroups:strong
</FIELDSTYLES>


References:
Indexed By Date Previous: Re: Fighting Spams
From: "Jeremy T. Bouse" <undrgrid@undergrid.com>
Next: Re: My small (I guess) little problem
From: Jason L Tibbitts III <tibbs@hpc.uh.edu>
Indexed By Thread Previous: WWW & Archives help
From: Roman Richardson <Roman.Richardson@state.mn.us>
Next: need help with 1.94.1 install
From: Frank Rizulo <rizulof@newschool.edu>

Google
 
Search Internet Search www.greatcircle.com