Great Circle Associates Majordomo-Workers
(November 1999)
 

Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: Re: MJ2 Just Stopped Working?
From: Chuck Milam <milam @ uwosh . edu>
Date: Sat, 6 Nov 1999 11:20:40 -0600 (CST)
To: Jason L Tibbitts III <tibbs @ math . uh . edu>
Cc: majordomo-workers @ greatcircle . com, "David E. Crawford, LtCol, CAP" <dcrawford @ mer . cap . gov>
In-reply-to: <ufaso2jwp06.fsf@epithumia.math.uh.edu>


On Sat, 6 Nov 1999, Jason L Tibbitts III wrote:

> CM> The symptoms of the problem: A message will come into the system, and a
> CM> "mj_email" process will get started.  The process will grow to ~6600 in
> CM> size, and then just stalls.
> 
> But surely it must emit some logging information before it gets to this.

[root@cap-ntc-4 qmail]# jobs
[2]   Running  tail -f mj_email.debug mj_resend.debug mj_trigger.debug
mj_majord.debug &  (wd: /var/majordomo/tmp)

Nothing new in the logs.  As a matter of fact, it seems that the logs
haven't been updated since the system started acting up.  I'll rotate them
out and try to re-run the queue--results below:

> If it sleeps forever it's probably trying to acquire a lock.  If it's
> chewing CPU then it just remains to find the place where it's looping.

It's chewing CPU:

  PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
12405 majordom  20   0  6816 6816  1324 R       0 32.8  5.4   0:08 mj_email
12400 majordom  20   0  6672 6672  1320 R       0 32.7  5.3   0:41 mj_email
12397 majordom  20   0  6668 6668  1320 R       0 30.3  5.3   2:31 mj_email

> Debug logs will help in this.

I'm double-checking to make sure I have the logs set up corretly, and that
the debug level is set high (500 is the max debug level, right?)

After archiving the old logs off (which hadn't been updated since last
week, when things stopped working), and restarting MJ, here's what shows
up:

drwx------   3 majordom majordom     1024 Nov  6 12:07 .
drwx------   3 majordom majordom     1024 Nov  6 12:03 ..
drwx------   2 majordom majordom     1024 Nov  6 12:07 locks
-rw-------   1 majordom majordom        0 Nov  6 12:07 mj_email.debug
-rw-------   1 majordom majordom        0 Nov  6 12:07 mj_majord.debug
-rw-------   1 majordom majordom        0 Nov  6 12:07 mj_resend.debug
-rw-------   1 majordom majordom        0 Nov  6 12:07 mje.12712.AAA.out
-rw-------   1 majordom majordom        8 Nov  6 12:07 mje12712.1.mime
-rw-------   1 majordom majordom      166 Nov  6 12:07 mjr12715.1.mime
-rw-------   1 majordom majordom      877 Nov  6 12:07 post.12715.AAA

mj_email processes are running and chewing CPU.

> Can you do any list operations from the command line?  (You can even
> post messages from there using the 'post' command so you should be
> able to duplicate anything from there.)  

I'm not sure I understand the proper way to use the "post" command from
the command line interface:

Majordomo>post test-list
--== Use of uninitialized value at blib/lib/Mj/Resend.pm (autosplit into
blib/lib/auto/Mj/Resend/post.al) line 145.
--== Use of uninitialized value at blib/lib/Mj/Resend.pm (autosplit into
blib/lib/auto/Mj/Resend/_check_poster.al) line 644.
--== Use of uninitialized value at blib/lib/Mj/List.pm (autosplit into
blib/lib/auto/Mj/List/is_subscriber.al) line 234.
Can't call method "isvalid" on an undefined value at blib/lib/Mj/List.pm
(autosplit into blib/lib/auto/Mj/List/is_subscriber.al) line 237.

Probably my bad on this.

> Just find something that doesn't work and crank up the debugging.

Posting to certian lists (like my small test lists) appears to work, and
logging is done properly (do these look like level 500 logs to you?  They 
look pretty detailed):

==> mj_resend.debug <==
--== Constant subroutine __need___va_list undefined at
/usr/lib/perl5/5.00503/sparc-linux/stdarg.ph line 9.
[12761]Majordomo Email client - Sat Nov  6 12:16:45 1999
[12761].Compilation took 1.15s, 0.10u
[12761].Loading modules
[12761].Loading modules..done, took 4.00 sec
[12761].Majordomo::new: /opt/mail/lists, lists.wiwg.cap.gov
[12761].Majordomo::new..done, took 1.00 sec
[12761].Majordomo::connect: resend, unknown@anonymous
[12761].Majordomo::connect..done, took 0.00 sec
[12761].Majordomo::dispatch: post_start, unknown@anonymous,
unknown@anonymous
[12761]..Mj::Resend::post_start: test-list
[12761]..Mj::Resend::post_start..done, took 0.00 sec
[12761].Majordomo::dispatch..done, took 0.00 sec
[12761].Majordomo::dispatch: post_done, unknown@anonymous,
unknown@anonymous
[12761]..Mj::Resend::post_done
[12761]...Mj::Resend::post: test-list, unknown@anonymous,
/var/majordomo/tmp/post.12761.AAA
[12761]....Mj::Resend::_check_approval
[12761]....Mj::Resend::_check_approval..done, took 0.00 sec
[12761]....Mj::Resend::_check_poster: Chuck Milam <cmilam@wiwg.cap.gov>
[12761]....Mj::Resend::_check_poster..done, took 0.00 sec
[12761]....Mj::Resend::_check_header
[12761]....Mj::Resend::_check_header..done, took 0.00 sec
[12761]....Mj::Resend::_post: test-list, Chuck Milam
<cmilam@wiwg.cap.gov>, /var/majordomo/tmp/post.12761.AAA
[12761].....Sending message 8
[12761].....Mj::Resend::_trim_approved
[12761].....Mj::Resend::_trim_approved..done, took 0.00 sec
[12761].....Mj::Resend::_add_fters
[12761].....Mj::Resend::_add_fters..done, took 0.00 sec
[12761].....Mj::Resend::do_digests
[12761].....Mj::Resend::do_digests..done, took 0.00 sec
[12761].....Mj::MailOut::deliver
[12761].....Mj::MailOut::deliver..done, took 0.00 sec
[12761]....Mj::Resend::_post..done, took 4.00 sec
[12761]...Mj::Resend::post..done, took 6.00 sec
[12761]..Mj::Resend::post_done..done, took 6.00 sec
[12761].Majordomo::dispatch..done, took 6.00 sec
[12761].-----Calling destructors-----
[12761]Majordomo Email client - Sat Nov  6 12:16:45 1999..done, took 11.00
sec

Larger, production lists (the ones that were actively used prior to/at the
time of the problem don't work).

-- 
Chuck Milam - milam@uwosh.edu
I.T. Division - Academic Computing
University of Wisconsin Oshkosh







Follow-Ups:
References:
Indexed By Date Previous: Re: MJ2 Just Stopped Working?
From: Jason L Tibbitts III <tibbs@math.uh.edu>
Next: Re: MJ2 Just Stopped Working?
From: SRE <eckert@climber.org>
Indexed By Thread Previous: Re: MJ2 Just Stopped Working?
From: Jason L Tibbitts III <tibbs@math.uh.edu>
Next: Re: MJ2 Just Stopped Working?
From: SRE <eckert@climber.org>

Google
 
Search Internet Search www.greatcircle.com