Jason wrote me about my posting, and I sent two messages in reply.
He suggested that I would do better to post to the list, which I now do.
>Could you give concrete examples, please? Fixing the recursive abort
>problem has eliminated a bunch of problems, but I can't fix anything if I
>don't know what to fix.
This wasn't intended as a specific bug report.
>BS> One of the neat features of 1.94b1 is config-test -- but why hasn't
>BS> this been released to the MJ-users list (if it has, I apologise for
>BS> missing it, I was off the list for a while)?
>Because it's part of 1.94. It checks things that weren't around in 1.93.
OK, although I wonder how much of a job it would be to eliminate the
1.94-specific checks so that it could be sued with earlier versions.
>BS> One of the biggest problems with MJ is a tendency to thrash, spawning
>BS> multiple processes that take over the entire machine. I've had several
>BS> such episodes during the past week, both with 1.93 and then 1.94 (a9
>BS> and then b1).
>I have never seen this. Could you provide more information? What perl
>version, what's in the logs, is it resend or majordomo that's spinning?
>Are they consuming memory? Is there a mail loop? Can you provide a case
>that will duplicate the problem? Unless you can point us in at least a
>general direction there's very little chance of getting anywhere on this.
>Especially since there are many who run these versions on even high traffic
>lists and haven't seen a problem.
The usual cause is probably a user error which an expert wouldn't make.
For example, I have already posted a suggestion that if shlock cannot
create a lock file, it checks to see whether the directory is writable
instead of just beating its head against the wall for ten minutes or so.
This would make Mj a lot more user friendly in my opinion.
Yesterday I posted a message about a problem that pretty well brought
down my system. I was flooded with messages of the form:
do_exec_sendmail, mailer -fMajordomo-Owner not executable
(over 3000 of them). I don't know what caused this, and I was obviously
much more concerned with stopping the problem than carefully documenting
it. I did notice that digest was spawning at a furious rate. Several
times I killed every process owned by majordomo and every process that
involved digest, and after several tries it seems to have died.
It appears that the cause was an error on my part, but precisely what
happened isn't clear. All the digests were for the dfo-study list, for
which the aliases were:
dfo-study: "|/usr/local/majordomo/wrapper resend -R -l dfo-study -f
dfo-study -h biome.bio.ns.ca dfo-study-outgoing"
dfo-study-outgoing: :include:/usr/local/majordomo/Lists/dfo-study,dfo-study-archive,"|/usr/local/majordomo/wrapper digest -r -C -l dfo-study-digest
dfo-study-request: "|/usr/local/majordomo/wrapper request-answer
but there is no dfo-study-digest list -- I was reconstructing a bunch of
aliases in the process of adding digests to several other lists, and
accidently added a digest line for dfo-study. I assume that this is what
caused the problem, and Mj did keep on rebuilding dfo-study-digest.config,
but I feel that the punishment far exceeded the crime!
The substance of my posting was that errors which are relatively minor,
or at least seem so from the viewpoint of a naive user, can easily bring
down the system. They won't happen to an expert, but they sure happen a
lot to the rest of us.
Bill Silvert, Habitat Ecology Section, Bedford Institute of Oceanography,
P. O. Box 1006, Dartmouth, Nova Scotia, CANADA B2Y 4A2, Tel. (902)426-1577