php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #57906 unfolding the whitespace in headers, still buggy
Submitted: 2007-11-10 10:53 UTC Modified: 2008-01-07 15:24 UTC
From: ca at lapage dot com Assigned: shire (profile)
Status: Closed Package: mailparse (PECL)
PHP Version: 5.2.4 OS: all?
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: ca at lapage dot com
New email:
PHP Version: OS:

 

 [2007-11-10 10:53 UTC] ca at lapage dot com
Description:
------------
Ver. 2.1.2 changed the whitespace condensing when unfolding multiline headers.  I think it is still wrong.

I am using test messages from the CPAN Perl module Mime-Tools:
http://search.cpan.org/dist/MIME-tools/MANIFEST
under "testmsgs/"

These are my comparisons, using
1. mailparse 2.1.1
2. mailparse 2.1.2
3. Mail/mimeDecode 1.5 (also called 1.48), the PHP PEAR package.

Problem A
File: testmsgs/multi-clen.msg
Header "content-type" value parses as:
1. string(44)
 "multipart/mixed; boundary="simple  boundary""
2. string(42)
 "multipart/mixed; boundary="simpleboundary""
3. string(43)
 "multipart/mixed; boundary="simple boundary""
(#1 has 2 spaces between 'simple' and 'boundary'.)
Both #3 and the Perl module decode 3 subparts, but mailparse does not.
The test message has a fold in the boundary, so it has to be parsed exactly.

Problem B
File: testmsgs/multi-nested.msg
Header "content-type" parses as:
1. string(48)
 "multipart/mixed;      boundary=unique-boundary-1"
2. string(46)
 "multipart/mixed;    boundary=unique-boundary-1"
3. string(43)
 "multipart/mixed; boundary=unique-boundary-1"
(#1 has 6 spaces after the semicolon.  #2 has 4.)
The 3 PHP packages decode the same structure.  Perl is different, but this bug is only about the whitespace.




Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-11-10 11:33 UTC] ca at lapage dot com
Another test shows it, using the *mailparse* test message
mailparse-2.1.1/tests/testdata/mime.txt
which is the same as
mailparse-2.1.2/tests/testdata/mime.txt

This is my comparison, using
1. mailparse 2.1.1
2. mailparse 2.1.2
3. Mail/mimeDecode 1.5 (also called 1.48), the PHP PEAR package.


Header "received":
1. string(190)
 "from TITAN (titan.brainnet.i [192.168.2.7]) 	by zaneeb.brainnet.i (8.10.2/8.10.2/SuSE Linux 8.10.0-0.3) with ESMTP id g87BfJ721279 	for <wez@thebrainroom.com>; Sat, 7 Sep 2002 12:41:19 +0100"
2. string(186)
 "from TITAN (titan.brainnet.i [192.168.2.7])by zaneeb.brainnet.i (8.10.2/8.10.2/SuSE Linux 8.10.0-0.3) with ESMTP id g87BfJ721279for <wez@thebrainroom.com>; Sat, 7 Sep 2002 12:41:19 +0100"
3. string(188) "from TITAN (titan.brainnet.i [192.168.2.7]) by zaneeb.brainnet.i (8.10.2/8.10.2/SuSE Linux 8.10.0-0.3) with ESMTP id g87BfJ721279 for <wez@thebrainroom.com>; Sat, 7 Sep 2002 12:41:19 +0100"

#1 has a tab in the whitespace before 'by'.
#2 has no whitespace
#3 has one space


Header "content-type":
1. string(70)
 "multipart/mixed; 	boundary="----=_NextPart_000_0007_01C2566B.DA7C64F0""
2. string(68)
 "multipart/mixed;boundary="----=_NextPart_000_0007_01C2566B.DA7C64F0""
3. string(69)
 "multipart/mixed; boundary="----=_NextPart_000_0007_01C2566B.DA7C64F0""

#1 has a tab in the whitespace before 'boundary'.
#2 has no whitespace
#3 has one space
 [2007-11-15 10:42 UTC] ca at lapage dot com
I doubt this is WinXP only, so changing it to all?.
 [2007-11-25 23:31 UTC] shire@php.net
Probably related to a patch I checked in.  I'll take a look at this.
 [2007-11-26 20:44 UTC] shire@php.net
Can you also please send me your simple example script for mailparse?
 [2007-11-30 22:48 UTC] ca at lapage dot com
My test scripts are in my sort-of package:
http://lapage.com/setup/mailpars2em/
in the "tester" subdir of the zip file.  It is not a true package, but does have lots of readme and is easy enough to install.

The point mailpars2em, the sort-of package, is to emulate mailparse for Cerberus 4 Help Desk on systems without mailparse and where compiling it is not allowed.
The emulation uses the PEAR package Mail/mimeDecode, and is limited to only what is needed for Cerberus 4.

The "tester" script is closely modelled on some Cerberus 4 code, and is not written to be a reduced test case for this bug, or a thorough test of the parser.
 [2007-12-06 14:43 UTC] shire@php.net
I've updated this code, could you please try CVS Head again?
 [2007-12-08 01:48 UTC] ca at lapage dot com
Do you have a win32 php_mailparse.dll?
I don't have a development system.  My Linux server does not allow compiling PHP extensions.  On WinXP I have MS Express C++ 2008, but the instructions on the PECL site for building PHP5 extensions are hopelessly out of date.
 [2007-12-20 13:58 UTC] shire@php.net
I believe you should be able to use one of the following regular builds:

http://pecl4win.php.net/ext.php/php_mailparse.dll
 [2008-01-06 23:03 UTC] ca at lapage dot com
I retested with a 2008-01-05 build of php_mailparse.dll and the problems are fixed.
As I read RFC2822, unfolding should remove the CRLF and nothing else.

~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem A: fixed
File: testmsgs/multi-clen.msg (perl CPAN)
The message contains one space in the quoted boundary:

Content-type: multipart/mixed; boundary="simple[CRLF]
[SP]boundary"

mailparse 2.1.1 decodes "simple[SP][SP]boundary"
mailparse 2.1.2 decodes "simpleboundary"
mailparse build decodes "simple[SP]boundary"

This aggrees with Mail/mimeDecode.php.  The decoded message now has 3 subparts, which matches the perl Mime-Tools reference.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem B: no real problem
File: testmsgs/multi-nested.msg (perl CPAN)
The message contains 5 spaces after the fold:

Content-Type: multipart/mixed;[CRLF]
[SP][SP][SP][SP][SP]boundary=unique-boundary-1

mailparse 2.1.1 decodes 6 [SP]
mailparse 2.1.2 decodes 4 [SP]
mailparse build decodes 5 [SP]
Mail/mimeDecode decodes 1 [SP]

And 5 spaces may be correct based on RFC2822.  This does not affect the decoding of the message subparts anyway.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem C: no real problem
File: mime.txt (in the mailparse package)
Several lines are folded with a [TAB], such as:

Content-Type: multipart/mixed;[CRLF]
[TAB]boundary="----=_NextPart_000_0007_01C2566B.DA7C64F0"

mailparse 2.1.1 decodes 1 [TAB]
mailparse 2.1.2 decodes no whitespace
mailparse build decodes 1 [SP]
Mail/mimeDecode decodes 1 [SP]

Version 2.1.2 was wrong, but RFC2822 could support 2.1.1.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem D: probably fixed
Take Problem A and change the lines to a quoted boundary with two spaces:

Content-type: multipart/mixed; boundary="simple[CRLF]
[SP][SP]boundary"

mailparse 2.1.1 decodes 2 [SP]
mailparse 2.1.2 decodes 1 [SP]
mailparse build decodes 2 [SP]
Mail/mimeDecode decodes 1 [SP]
perl MIME-tools decodes 2 [SP]

It is very important what is the exact boundary string.  Based on whether it decodes the subparts in my tests, I found that perl used 2 [SP], which agrees with the latest mailparse build.
The others are probably wrong, including Mail/mimeDecode.php.  RFC2822 says "just remove the CRLF".

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem E: no problem
Take Problem A / Problem C and change the lines to a quoted boundary with one TAB:

Content-type: multipart/mixed; boundary="simple[CRLF]
[TAB]boundary"

Illegal by RFC2046.  Boundaries cannot contain tabs.  So this is not an issue.
 [2008-01-07 15:24 UTC] shire@php.net
Thank you for the detailed verification!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 17:01:58 2024 UTC