Great Circle Associates Majordomo-Users
(March 2007)
 

Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: html-stripper-0.1 patch and problems with MIME multipart messages
From: Toomas Aas <toomas . aas @ raad . tartu . ee>
Date: Mon, 12 Mar 2007 20:57:11 +0200
To: majordomo-users @ greatcircle . com
User-agent: Thunderbird 1.5.0.9 (X11/20070304)

Hello!

I seem to have a problem with html-stripper-0.1 patch and some MIME multipart messages.

Let's say the original message looks someting like this:

==========================================================
From: sender <sender @
hot .
ee>
Date: Mon, 12 Mar 2007 19:49:30 +0200
To: test-l @
mydomain .
com
Subject: =?utf-8?q?p=C3=A4iste=20test?=
MIME-Version: 1.0
Content-Type: multipart/alternative;
    boundary="--------=BoundaryPyDog1173721770.89------"
Message-Id: <20070312174934 .
4E0517B43C @
mh3-4 .
hot .
ee>


----------=BoundaryPyDog1173721770.89------ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Mis paistab kirja p=C3=A4istes? (katse nr 1)
----------=BoundaryPyDog1173721770.89------
Content-Type: text/html;
    charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Mis paistab kirja p=C3=A4istes? (katse nr 1)
----------=BoundaryPyDog1173721770.89--------

==========================================================

So it's an multipart/alternative message with text/plain part specifying charset UTF-8.

After passing through Majordomo with html-stripper enabled and html_policy set to 'strip', the message becomes something like this:

==========================================================
From: <sender @
hot .
ee>
Date: Mon, 12 Mar 2007 19:49:30 +0200
To: test-l @
mydomain .
com
Subject: =?utf-8?q?p=C3=A4iste=20test?=
MIME-Version: 1.0
Content-Type: text/plain
Message-Id: <20070312175506 .
88428207DB @
mh3-5 .
hot .
ee>
Sender: owner-test-l @
post .
raad .
tartu .
ee
Precedence: bulk

Mis paistab kirja päistes? (katse nr 1)
==========================================================

As you see, the content-type in message headers has changed from multipart/alternative to text/plain, but the charset info has been completely lost. As a result of this, the message body contains raw 8-bit characters which remain untranslated in MUA.

If I set the list's html_policy to 'pass', the message passes through with all it's parts intact and is displayed correctly to users.

I tried to look at the code of html-stripper-0.1 patch, but I'm not a programmer so only thing I got was a headache ;) Maybe someone has a fix?

Thanks in advance,
--
Toomas Aas

Indexed By Date Previous:
From: (nil)
Next:
From: (nil)
Indexed By Thread Previous:
From: (nil)
Next:
From: (nil)

Google
 
Search Internet Search www.greatcircle.com