php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #57906 unfolding the whitespace in headers, still buggy
Submitted: 2007-11-10 10:53 UTC Modified: 2008-01-07 15:24 UTC
From: ca at lapage dot com Assigned: shire (profile)
Status: Closed Package: mailparse (PECL)
PHP Version: 5.2.4 OS: all?
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: ca at lapage dot com
New email:
PHP Version: OS:

 

 [2007-11-10 10:53 UTC] ca at lapage dot com
Description:
------------
Ver. 2.1.2 changed the whitespace condensing when unfolding multiline headers.  I think it is still wrong.

I am using test messages from the CPAN Perl module Mime-Tools:
http://search.cpan.org/dist/MIME-tools/MANIFEST
under "testmsgs/"

These are my comparisons, using
1. mailparse 2.1.1
2. mailparse 2.1.2
3. Mail/mimeDecode 1.5 (also called 1.48), the PHP PEAR package.

Problem A
File: testmsgs/multi-clen.msg
Header "content-type" value parses as:
1. string(44)
 "multipart/mixed; boundary="simple  boundary""
2. string(42)
 "multipart/mixed; boundary="simpleboundary""
3. string(43)
 "multipart/mixed; boundary="simple boundary""
(#1 has 2 spaces between 'simple' and 'boundary'.)
Both #3 and the Perl module decode 3 subparts, but mailparse does not.
The test message has a fold in the boundary, so it has to be parsed exactly.

Problem B
File: testmsgs/multi-nested.msg
Header "content-type" parses as:
1. string(48)
 "multipart/mixed;      boundary=unique-boundary-1"
2. string(46)
 "multipart/mixed;    boundary=unique-boundary-1"
3. string(43)
 "multipart/mixed; boundary=unique-boundary-1"
(#1 has 6 spaces after the semicolon.  #2 has 4.)
The 3 PHP packages decode the same structure.  Perl is different, but this bug is only about the whitespace.




Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-11-10 11:33 UTC] ca at lapage dot com
Another test shows it, using the *mailparse* test message
mailparse-2.1.1/tests/testdata/mime.txt
which is the same as
mailparse-2.1.2/tests/testdata/mime.txt

This is my comparison, using
1. mailparse 2.1.1
2. mailparse 2.1.2
3. Mail/mimeDecode 1.5 (also called 1.48), the PHP PEAR package.


Header "received":
1. string(190)
 "from TITAN (titan.brainnet.i [192.168.2.7]) 	by zaneeb.brainnet.i (8.10.2/8.10.2/SuSE Linux 8.10.0-0.3) with ESMTP id g87BfJ721279 	for <wez@thebrainroom.com>; Sat, 7 Sep 2002 12:41:19 +0100"
2. string(186)
 "from TITAN (titan.brainnet.i [192.168.2.7])by zaneeb.brainnet.i (8.10.2/8.10.2/SuSE Linux 8.10.0-0.3) with ESMTP id g87BfJ721279for <wez@thebrainroom.com>; Sat, 7 Sep 2002 12:41:19 +0100"
3. string(188) "from TITAN (titan.brainnet.i [192.168.2.7]) by zaneeb.brainnet.i (8.10.2/8.10.2/SuSE Linux 8.10.0-0.3) with ESMTP id g87BfJ721279 for <wez@thebrainroom.com>; Sat, 7 Sep 2002 12:41:19 +0100"

#1 has a tab in the whitespace before 'by'.
#2 has no whitespace
#3 has one space


Header "content-type":
1. string(70)
 "multipart/mixed; 	boundary="----=_NextPart_000_0007_01C2566B.DA7C64F0""
2. string(68)
 "multipart/mixed;boundary="----=_NextPart_000_0007_01C2566B.DA7C64F0""
3. string(69)
 "multipart/mixed; boundary="----=_NextPart_000_0007_01C2566B.DA7C64F0""

#1 has a tab in the whitespace before 'boundary'.
#2 has no whitespace
#3 has one space
 [2007-11-15 10:42 UTC] ca at lapage dot com
I doubt this is WinXP only, so changing it to all?.
 [2007-11-25 23:31 UTC] shire@php.net
Probably related to a patch I checked in.  I'll take a look at this.
 [2007-11-26 20:44 UTC] shire@php.net
Can you also please send me your simple example script for mailparse?
 [2007-11-30 22:48 UTC] ca at lapage dot com
My test scripts are in my sort-of package:
http://lapage.com/setup/mailpars2em/
in the "tester" subdir of the zip file.  It is not a true package, but does have lots of readme and is easy enough to install.

The point mailpars2em, the sort-of package, is to emulate mailparse for Cerberus 4 Help Desk on systems without mailparse and where compiling it is not allowed.
The emulation uses the PEAR package Mail/mimeDecode, and is limited to only what is needed for Cerberus 4.

The "tester" script is closely modelled on some Cerberus 4 code, and is not written to be a reduced test case for this bug, or a thorough test of the parser.
 [2007-12-06 14:43 UTC] shire@php.net
I've updated this code, could you please try CVS Head again?
 [2007-12-08 01:48 UTC] ca at lapage dot com
Do you have a win32 php_mailparse.dll?
I don't have a development system.  My Linux server does not allow compiling PHP extensions.  On WinXP I have MS Express C++ 2008, but the instructions on the PECL site for building PHP5 extensions are hopelessly out of date.
 [2007-12-20 13:58 UTC] shire@php.net
I believe you should be able to use one of the following regular builds:

http://pecl4win.php.net/ext.php/php_mailparse.dll
 [2008-01-06 23:03 UTC] ca at lapage dot com
I retested with a 2008-01-05 build of php_mailparse.dll and the problems are fixed.
As I read RFC2822, unfolding should remove the CRLF and nothing else.

~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem A: fixed
File: testmsgs/multi-clen.msg (perl CPAN)
The message contains one space in the quoted boundary:

Content-type: multipart/mixed; boundary="simple[CRLF]
[SP]boundary"

mailparse 2.1.1 decodes "simple[SP][SP]boundary"
mailparse 2.1.2 decodes "simpleboundary"
mailparse build decodes "simple[SP]boundary"

This aggrees with Mail/mimeDecode.php.  The decoded message now has 3 subparts, which matches the perl Mime-Tools reference.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem B: no real problem
File: testmsgs/multi-nested.msg (perl CPAN)
The message contains 5 spaces after the fold:

Content-Type: multipart/mixed;[CRLF]
[SP][SP][SP][SP][SP]boundary=unique-boundary-1

mailparse 2.1.1 decodes 6 [SP]
mailparse 2.1.2 decodes 4 [SP]
mailparse build decodes 5 [SP]
Mail/mimeDecode decodes 1 [SP]

And 5 spaces may be correct based on RFC2822.  This does not affect the decoding of the message subparts anyway.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem C: no real problem
File: mime.txt (in the mailparse package)
Several lines are folded with a [TAB], such as:

Content-Type: multipart/mixed;[CRLF]
[TAB]boundary="----=_NextPart_000_0007_01C2566B.DA7C64F0"

mailparse 2.1.1 decodes 1 [TAB]
mailparse 2.1.2 decodes no whitespace
mailparse build decodes 1 [SP]
Mail/mimeDecode decodes 1 [SP]

Version 2.1.2 was wrong, but RFC2822 could support 2.1.1.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem D: probably fixed
Take Problem A and change the lines to a quoted boundary with two spaces:

Content-type: multipart/mixed; boundary="simple[CRLF]
[SP][SP]boundary"

mailparse 2.1.1 decodes 2 [SP]
mailparse 2.1.2 decodes 1 [SP]
mailparse build decodes 2 [SP]
Mail/mimeDecode decodes 1 [SP]
perl MIME-tools decodes 2 [SP]

It is very important what is the exact boundary string.  Based on whether it decodes the subparts in my tests, I found that perl used 2 [SP], which agrees with the latest mailparse build.
The others are probably wrong, including Mail/mimeDecode.php.  RFC2822 says "just remove the CRLF".

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem E: no problem
Take Problem A / Problem C and change the lines to a quoted boundary with one TAB:

Content-type: multipart/mixed; boundary="simple[CRLF]
[TAB]boundary"

Illegal by RFC2046.  Boundaries cannot contain tabs.  So this is not an issue.
 [2008-01-07 15:24 UTC] shire@php.net
Thank you for the detailed verification!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 12:01:31 2024 UTC