Great Circle Associates Majordomo-Workers
(April 1994)
 

Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: digests and archives
From: pdc @ lunch . asd . sgi . com (Paul Close)
Date: Thu, 21 Apr 1994 10:13:35 -0700 (PDT)
To: majordomo-workers @ greatcircle . com

I started a private discussion with John Rouillard that he thought should
get some air time on majordomo-workers.  So here goes.

The first topic is digests.

For 1.62, I hacked digest so that it stripped all headers it wouldn't need
at receive time, so that digest counts were more accurate.  I also added
the ability to send a digest based not just on byte count, but line count
and age of the oldest message.  My thinking here was that most people tend
to think of the size of large articles in lines (probably because of usenet),
and that the max byte size is really just a parameter for the mailer.  As
for age, I have a list with low traffic, but some people still like the
digests.  So I really only want to push the digest once the oldest message
gets past a certain point (3 days, currently).  That way, cron isn't pushing
out digests with only one article every day, yet messages don't grow ancient
in the queue.

Finally, I separated the "should we send" logic from article reception so
that cron can push digests if it's appropriate (usually based on max age,
since receive still checks the byte/line counts).  I suppose you could split
the decisions into receive-time and cron categories, but it's simplest to do
all of the decisions both places.

I'd welcome any comments on my approach....  One comment John had was that
he wouldn't want the headers stripped at receive time, since it could make
tracking problems more difficult if there is no message id header etc.  My
counter-proposal is to not count message headers, but still leave them in
the message.  This requires slightly more processing of the files, but
about the same as stripping headers and counting the lines took anyway.

For archives, it's easiest to just include the last email message:

I wrote:
> > For archives, I think I'd like to see two variables: archive_keep_headers,
> > and archive_strip_headers.  Both are regexp_arrays, and in no case would
> > From:, Subject:, or Date: be stripped.  The algorithm would be to strip
> > any headers that matched archive_strip_headers unless they also match
> > archive_keep_headers.  So you could have archive_strip_headers = /.*/ and
> > archive_keep_headers = /Message-ID/ for example, to keep just message ids
> > (and from, subject, date).
> > 
> > What do you think?

John wrote:
> Definatley bring this up on majordomo-workers. As a matter of fact,
> you can use this correspondance as the base of you message. I chose
> the dual keep/discard paradigm, because it fulfilled my need. I am not
> sure it is the best way to do it. I can think of one other paradigm
> that would work for all cases, and that would work with a single
> keyword.
> 
> 	archive_headers << EOF
> 		Message-ID			yes
> 		/Received.* cs.umb.edu*./ 	no
> 		Received 			yes
> 		/[Xx]-.*/ 			yes
> 		/x-.*/ 				yes
> 		/.*/				no
>         EOF
> 
> This will keep the Message ID headers, remove any received headers
> that pertain to systems at cs.umb.edu, keep all other received
> headers, and keep all X-* headers regardless of the case of the X. I
> think this paradigm may work better overall for regexp arrays. Even
> advertize/noadvertize could be shrunk into a:
> 
> 	private_lists = access_list
> 
> 	private_access << EOF
> 		rouilj			no
> 		/.*\.cs\.umb\.edu$/	yes
> 		/.*/ 			no
> 	EOF
> 
> Which would not list the list if the mail was from rouilj, would list
> it if the address was from cs.umb.edu, and would not list it
> otherwise. BTW the simple strings (i.e. not / delimeted) would have
> all non-alphanumerics escaped, and would be used as the pattern for a
> search.

-- 
Paul Close	     pdc@sgi.com	   ...!{ames, decwrl, uunet}!sgi!pdc

			No fate but what we make



Indexed By Date Previous: Re: [Elizabeth Lear Newman: [: Idea: Ramp-Up Procedure]]
From: "John P. Rouillard" <rouilj@terminus.cs.umb.edu>
Next: Digest code diffs for 1.90
From: pdc@lunch.asd.sgi.com (Paul Close)
Indexed By Thread Previous: Re: [Elizabeth Lear Newman: [: Idea: Ramp-Up Procedure]]
From: "John P. Rouillard" <rouilj@terminus.cs.umb.edu>
Next: Digest code diffs for 1.90
From: pdc@lunch.asd.sgi.com (Paul Close)

Google
 
Search Internet Search www.greatcircle.com