[ Tim Shuttleworth writes: ]
>
> I have just this morning had to restore a 1.94 list from a backup
> because the list disappeared. The only evidence was an undeleted lock
> file (L.listname). However, I suspect that the problem is that the
> list spool directory is NFS mounted from another machine, and the
> locking method that Majordomo relies on (the atomicity of the link()
> function) doesn't work for NFS mounted files.
You got that right!
> So, very occasionally, two requests to lock a file occur from two
> majordomo processes at the same time, and each one thinks it has an
> exclusive lock.
[...]
> the disappearing list problem occurred when we were only running
> majordomo on one machine, so having two machines running it at the
> same time is definitely not the problem.
Let's say it again: lock-file type locking does *NOT* work over NFS. NFS
uses datagrams. The server response to a given datagram just means the
server accepted the request, not that it's already finished it (unlike
local file system requests). Two processes on the same machine trying
to lock the same NFS file have a finite probability of both being told
their lock request succeeded.
The bottom line is you can't run Mj on NFS. File locking is on the list
for a future version, but it's such a can of worms that it's too much
for anything but a major change revision. If you think you want to
try developing a better locking system, just remember that you can't
break compatibility and not all file locking is created equal. The
current plan is to offer different locking schemes by selecting, via
installer configuration, variant shlock.pl files.
--
Dave Wolfe
References:
|
|