php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49272 Incorrect encoding in structured header field bodies
Submitted: 2009-08-16 08:36 UTC Modified: 2014-07-15 11:17 UTC
From: u235e at hotmail dot com Assigned: yohgaki (profile)
Status: Closed Package: mbstring related
PHP Version: 5.2.10 OS: *
Private report: No CVE-ID: None
 [2009-08-16 08:36 UTC] u235e at hotmail dot com
Description:
------------
Function: mb_encode_mimeheader

When trying to construct a structured header field like From or To, only _words_ within phrases or ctext in comments may be encoded, especially not within "quoted strings" as of RFC 2047 section 5.
This function does not take that into account, even worse it may make the field invalid as it greedily encodes everything after the first encountered WSP delimited text with non ASCII characters.

This function is only useful for unstructured header fields as it is and thus only for the subject field in most common cases.
Judging from the manual and the given example, I take it that this is not the only intended use.

The example below would also apply to (comments) not just "quoted strings". Technically the quotes should probably not be part of the encoded word ("lexically invisible") where as the () in comments must stay in place to still recognize the text as a comment.

And as a side note: I dont understand why digits are encoded and spaces not as underscores in "Q" scheme - renders the point of this scheme useless?

Reproduce code:
---------------
<?php
mb_internal_encoding('ISO-8859-1');
$name = "Peter \"Der M\xFCller\""; // German - Peter <">Der M?ller<">
// valid RFC 2822 display-name
$mbox = "peter.mueller";
$doma = "example.com";
$addr = mb_encode_mimeheader($name, "ISO-8859-1", "Q") . " <" . $mbox . "@" . $doma . ">";
echo $addr;
?>

Expected result:
----------------
should be:
Peter =?ISO-8859-1?Q?=22Der=20M=FCller=22?= <peter.mueller@example.com>
or very greedy:
=?ISO-8859-1?Q?Peter=20=22Der=20M=FCller=22?= <peter.mueller@example.com>
or maybe even with the quotes stripped:
Peter =?ISO-8859-1?Q?Der=20M=FCller?= <peter.mueller@example.com>


Actual result:
--------------
Peter "Der =?ISO-8859-1?Q?M=FCller=22?= <peter.mueller@example.com>

Which is not a valid RFC 2822 name-addr (display-name) any more!


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-08-16 08:45 UTC] u235e at hotmail dot com
Disregard the parts about stripping quotes.
 [2009-08-16 08:53 UTC] u235e at hotmail dot com
Oh, and the display-name in the example is technically not valid because of the non ASCII char of course - but that's the point of this function, right? ;-)
 [2010-02-22 06:39 UTC] moriyoshi@php.net
mb_encode_mimeheader() is supposed to be used to encode such words or the header itself that is known to contain only words, not to encode the entire header.  You need to parse it manually beforehand.
 [2014-07-15 11:17 UTC] yohgaki@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: yohgaki
 [2014-07-15 11:17 UTC] yohgaki@php.net
Cannot see the report text. Probably due to malformed encoding data. Please report new bug, if you have issue still.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 04:01:38 2024 UTC