On Sat, 6 Nov 1999, Jason L Tibbitts III wrote:
> CM> The symptoms of the problem: A message will come into the system, and a
> CM> "mj_email" process will get started. The process will grow to ~6600 in
> CM> size, and then just stalls.
>
> But surely it must emit some logging information before it gets to this.
[root@cap-ntc-4 qmail]# jobs
[2] Running tail -f mj_email.debug mj_resend.debug mj_trigger.debug
mj_majord.debug & (wd: /var/majordomo/tmp)
Nothing new in the logs. As a matter of fact, it seems that the logs
haven't been updated since the system started acting up. I'll rotate them
out and try to re-run the queue--results below:
> If it sleeps forever it's probably trying to acquire a lock. If it's
> chewing CPU then it just remains to find the place where it's looping.
It's chewing CPU:
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
12405 majordom 20 0 6816 6816 1324 R 0 32.8 5.4 0:08 mj_email
12400 majordom 20 0 6672 6672 1320 R 0 32.7 5.3 0:41 mj_email
12397 majordom 20 0 6668 6668 1320 R 0 30.3 5.3 2:31 mj_email
> Debug logs will help in this.
I'm double-checking to make sure I have the logs set up corretly, and that
the debug level is set high (500 is the max debug level, right?)
After archiving the old logs off (which hadn't been updated since last
week, when things stopped working), and restarting MJ, here's what shows
up:
drwx------ 3 majordom majordom 1024 Nov 6 12:07 .
drwx------ 3 majordom majordom 1024 Nov 6 12:03 ..
drwx------ 2 majordom majordom 1024 Nov 6 12:07 locks
-rw------- 1 majordom majordom 0 Nov 6 12:07 mj_email.debug
-rw------- 1 majordom majordom 0 Nov 6 12:07 mj_majord.debug
-rw------- 1 majordom majordom 0 Nov 6 12:07 mj_resend.debug
-rw------- 1 majordom majordom 0 Nov 6 12:07 mje.12712.AAA.out
-rw------- 1 majordom majordom 8 Nov 6 12:07 mje12712.1.mime
-rw------- 1 majordom majordom 166 Nov 6 12:07 mjr12715.1.mime
-rw------- 1 majordom majordom 877 Nov 6 12:07 post.12715.AAA
mj_email processes are running and chewing CPU.
> Can you do any list operations from the command line? (You can even
> post messages from there using the 'post' command so you should be
> able to duplicate anything from there.)
I'm not sure I understand the proper way to use the "post" command from
the command line interface:
Majordomo>post test-list
--== Use of uninitialized value at blib/lib/Mj/Resend.pm (autosplit into
blib/lib/auto/Mj/Resend/post.al) line 145.
--== Use of uninitialized value at blib/lib/Mj/Resend.pm (autosplit into
blib/lib/auto/Mj/Resend/_check_poster.al) line 644.
--== Use of uninitialized value at blib/lib/Mj/List.pm (autosplit into
blib/lib/auto/Mj/List/is_subscriber.al) line 234.
Can't call method "isvalid" on an undefined value at blib/lib/Mj/List.pm
(autosplit into blib/lib/auto/Mj/List/is_subscriber.al) line 237.
Probably my bad on this.
> Just find something that doesn't work and crank up the debugging.
Posting to certian lists (like my small test lists) appears to work, and
logging is done properly (do these look like level 500 logs to you? They
look pretty detailed):
==> mj_resend.debug <==
--== Constant subroutine __need___va_list undefined at
/usr/lib/perl5/5.00503/sparc-linux/stdarg.ph line 9.
[12761]Majordomo Email client - Sat Nov 6 12:16:45 1999
[12761].Compilation took 1.15s, 0.10u
[12761].Loading modules
[12761].Loading modules..done, took 4.00 sec
[12761].Majordomo::new: /opt/mail/lists, lists.wiwg.cap.gov
[12761].Majordomo::new..done, took 1.00 sec
[12761].Majordomo::connect: resend, unknown@anonymous
[12761].Majordomo::connect..done, took 0.00 sec
[12761].Majordomo::dispatch: post_start, unknown@anonymous,
unknown@anonymous
[12761]..Mj::Resend::post_start: test-list
[12761]..Mj::Resend::post_start..done, took 0.00 sec
[12761].Majordomo::dispatch..done, took 0.00 sec
[12761].Majordomo::dispatch: post_done, unknown@anonymous,
unknown@anonymous
[12761]..Mj::Resend::post_done
[12761]...Mj::Resend::post: test-list, unknown@anonymous,
/var/majordomo/tmp/post.12761.AAA
[12761]....Mj::Resend::_check_approval
[12761]....Mj::Resend::_check_approval..done, took 0.00 sec
[12761]....Mj::Resend::_check_poster: Chuck Milam <cmilam@wiwg.cap.gov>
[12761]....Mj::Resend::_check_poster..done, took 0.00 sec
[12761]....Mj::Resend::_check_header
[12761]....Mj::Resend::_check_header..done, took 0.00 sec
[12761]....Mj::Resend::_post: test-list, Chuck Milam
<cmilam@wiwg.cap.gov>, /var/majordomo/tmp/post.12761.AAA
[12761].....Sending message 8
[12761].....Mj::Resend::_trim_approved
[12761].....Mj::Resend::_trim_approved..done, took 0.00 sec
[12761].....Mj::Resend::_add_fters
[12761].....Mj::Resend::_add_fters..done, took 0.00 sec
[12761].....Mj::Resend::do_digests
[12761].....Mj::Resend::do_digests..done, took 0.00 sec
[12761].....Mj::MailOut::deliver
[12761].....Mj::MailOut::deliver..done, took 0.00 sec
[12761]....Mj::Resend::_post..done, took 4.00 sec
[12761]...Mj::Resend::post..done, took 6.00 sec
[12761]..Mj::Resend::post_done..done, took 6.00 sec
[12761].Majordomo::dispatch..done, took 6.00 sec
[12761].-----Calling destructors-----
[12761]Majordomo Email client - Sat Nov 6 12:16:45 1999..done, took 11.00
sec
Larger, production lists (the ones that were actively used prior to/at the
time of the problem don't work).
--
Chuck Milam - milam@uwosh.edu
I.T. Division - Academic Computing
University of Wisconsin Oshkosh
Follow-Ups:
References:
|
|