> Honestly, have you seen a single complaint of list erasure with 1.94? I
> haven't, and I've tested it under all of the conditions which would kill
> 1.93. I cannot reproduce the problem. I've driven my load above 500
> sending ten thousand subscription requests, during which I filled up the
> partition. The list was not erased (though of course some requests were
> lost, but if you can't append to the list, you just can't).
I have just this morning had to restore a 1.94 list from a backup because the
list disappeared. The only evidence was an undeleted lock file (L.listname).
However, I suspect that the problem is that the list spool directory is NFS
mounted from another machine, and the locking method that Majordomo relies
on (the atomicity of the link() function) doesn't work for NFS mounted files.
So, very occasionally, two requests to lock a file occur from two majordomo
processes at the same time, and each one thinks it has an exclusive lock.
At least, that is my current theory. But it's extremely difficult to see it
in practice, and frankly it doesn't happen often enough to be a major worry
for me right now. (I have plenty of other problems to deal with first! :^)
However, it would be nice if Majordomo 1.95 (or 2.0) offered a choice between
link() style locking for files on a local disk and flock/lockf style locking
via the lockd for files that are potentially NFS mounted. At the very least,
the Majordomo documentation should warn that use of NFS mounted spool
partitions renders the locking mechanism ineffective.
The NFS mounting, incidentally, is to allow two machines to jointly run the
majordomo stuff, and if one should have to be taken down, majordomo will keep
going on the other machine. Having said that, the disappearing list problem
occurred when we were only running majordomo on one machine, so having two
machines running it at the same time is definitely not the problem.
Network Support Programmer