Great Circle Associates List-Managers
(April 2001)
 

Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: Re: Digest MIME types...
From: Tim Pierce <twp @ rootsweb . com>
Date: Wed, 11 Apr 2001 11:27:06 -0400
To: J C Lawrence <claw @ kanga . nu>
Cc: List-Managers @ GreatCircle . COM
In-reply-to: <4538.987001097@kanga.nu>; from claw@kanga.nu on Wed, Apr 11, 2001 at 07:58:17AM -0700
References: <v0313036db6f51490e9f4@[208.165.39.28]> <20010408112536.O8469@ma-1.rootsweb.com> <v0313030bb6f6388e7bcc@[24.104.7.158]> <5.0.0.25.2.20010410125910.03551bb0@pop3.vo.cnchost.com> <5.0.0.25.2.20010410184754.02cc2b00@pop3.vo.cnchost.com> <inet-list@vo.cnchost.com> <27007.986966766@kanga.nu> <20010411023802.A447@ma-1.rootsweb.com> <twp@rootsweb.com> <4538.987001097@kanga.nu>
User-agent: Mutt/1.2.5i

On Wed, Apr 11, 2001 at 07:58:17AM -0700, J C Lawrence wrote:
> On Wed, 11 Apr 2001 02:38:02 -0400 
> Tim Pierce <twp@rootsweb.com> wrote:
> 
> > On Tue, Apr 10, 2001 at 10:26:06PM -0700, J C Lawrence wrote:
> >> My concern right now is different: Can I reliably strip the HTML
> >> portion from multipart/alternative text/plain text/html messages
> >> and end up with a readable and moderately well formatted
> >> texp/plain message?  Currently the answer seems to be, "no".
> 
> > This can easily be done mechanically.  Works quite well.  
> 
> So far I've found that it works, yes, and it even makes a valiant
> effort, however its reformatting of text/plains after removal of the
> text/html (often the margins etc are screwed) has left something to
> be desired.

If you're actually trying to render text/html into text/plain, I
can see how that could be a problem.  But in a multipart/alternative
message, there should already be a text/plain portion already
prepared for you.  All you have to do is remove the extraneous
bits.  It's not your fault if the poster's mail program screwed up
the formatting on that one. :-)

> > Look for demime.pl, which has been discussed here a great deal; if
> > it's too resource-intensive for you, drop me a line and I'll point
> > you to some alternatives.
> 
> What else beyond stripmime?

I was thinking of Rachel Blackman's NORM library for Listar.  I
haven't used it personally, but it looks sufficiently general-purpose
that you could wrap a little code around it to jump through whatever
hoops you need to demangle the HTML.
ftp://listar.org/pub/listar/other/norm-0.1.tar.gz

For the specific case of extracting only the text/plain portion
from multipart/alternative, I wrote an `unhtml' program for our
site that works quite well.  It's packaged as a single file of C
source code, but doesn't depend on any exotic foreign libraries
and ought to compile on just about any machine with an ANSI compiler
and library.  I'm lazy and haven't gotten around to putting it up
for FTP, but I can mail it to you if you want to play around with
it.

Since both of these tools are written in C, they ought to place
less load on your machine than a Perl-based tool, if you have big
active lists or an overburdened server.




Follow-Ups:
References:
Indexed By Date Previous: Re: Digest MIME types...
From: J C Lawrence <claw@kanga.nu>
Next: Re: Digest MIME types...
From: J C Lawrence <claw@kanga.nu>
Indexed By Thread Previous: Re: Digest MIME types...
From: J C Lawrence <claw@kanga.nu>
Next: Re: Digest MIME types...
From: J C Lawrence <claw@kanga.nu>

Google
 
Search Internet Search www.greatcircle.com