php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #23192 mb_encode_mimeheader mangles long lines?
Submitted: 2003-04-13 21:59 UTC Modified: 2004-03-15 15:28 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:2 of 2 (100.0%)
Same Version:2 (100.0%)
Same OS:2 (100.0%)
From: jc at mega-bucks dot co dot jp Assigned: moriyoshi (profile)
Status: Closed Package: Documentation problem
PHP Version: 4.3.2RC1 OS: Red Hat Linux 8.0
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: jc at mega-bucks dot co dot jp
New email:
PHP Version: OS:

 

 [2003-04-13 21:59 UTC] jc at mega-bucks dot co dot jp
I am using mb_encode_mimeheader() and it seems that the function does not work properly on long subject lines ...

$headers  = "MIME-Version: 1.0\n" ;
$headers .= "From: $from <$from_email>\r\n";
$headers .= "Reply-To: $from <$from_email>\r\n";
$headers .= "Content-Type: text/plain;charset=ISO-2022-JP\n";
$body     = mb_convert_encoding($body, "ISO-2022-JP","AUTO");
 
$subject  = "&#12304;tokyo-avland&#12305;&#20844;&#30340;&#35388;&#26126;&#26360;&#12398;&#36865;&#20184;&#26399;&#38480;&#12398;&#12362;&#30693;&#12425;&#12379;";

/*

 If I cut the subject to be 

  $subject  = "&#12304;tokyo-avland&#12305;&#20844;&#30340;&#35388;&#26126;&#26360;&#12398;&#36865;&#20184;";
  
  There is no problem. Problems occur if I I put characters
  after this ... This doens't work either:

  $subject  = "&#12304;tokyo-avland&#12305;&#20844;&#30340;&#35388;&#26126;&#26360;&#12398;&#36865;&#20184;";
  $subject .= "&#26399;&#38480;&#12398;&#12362;&#30693;&#12425;&#12379;";


*/

mb_language("ja");
$subject = mb_convert_encoding($subject, "ISO-2022-JP","AUTO");
$subject = mb_encode_mimeheader($subject);

mail($to, $subject, $body, $headers, $sendmail_params);


The mail I receive has the following subject line:

Subject: =?ISO-2022-JP?B?GyRCIVobKEJ0b2t5by1hdmxhbmQbJEIhWzh4RSo+WkxAPXEkTkF3SVU0?=
 =?ISO-2022-JP?B?fDhCJE4kKkNOJGkkOxsoQg==?=

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-04-13 22:09 UTC] jc at mega-bucks dot co dot jp
Update: The subject is properly encoded *if* I truncate it to be:

$subject = "&#12304;tokyo-avland&#12305;&#20844;&#30340;&#35388;&#26126;&#26360;&#12398;&#36865;";

This gives the following subject header:

Subject: =?ISO-2022-JP?B?GyRCIVobKEJ0b2t5by1hdmxhbmQbJEIhWzh4RSo+WkxAPXEkTkF3GyhC?=
 [2003-04-16 15:50 UTC] moriyoshi@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Although mb_send_mail() seems to resolve your problem, the behaviour is quite by design.

mb_encode_mimeheader is implemented in accordance with section 2 ("Syntax of encoded-words") of RFC2047(http://www.ietf.org/rfc/rfc2047.txt).

---------------------------------------------------------
   An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters.  If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used.

   While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters.
--------------------------------------------------------- EOQ


 [2003-04-16 19:51 UTC] jc at mega-bucks dot co dot jp
>Please double-check the documentation

Ok, 

#1 I can't find anywhere in the docs that say that mb_encode_mimeheader() follows RFC2047.

#2 Just becuase it follows RFC2047 doesn't say anything about how mb_encode_mimeheader() treats lines that are too long. Nowhere in the documentation does it say that mb_encode_mimeheader() truncates lines ...

#3 Since the documentation doesn't say anything I had assumed that mb_encode_mimeheader() would trim correctly convert strings that are too long into "CRLF space" encoded words. Is this right? If so it should be added to the documentation

#4 It seems that mb_encode_mimeheader() *did* correctly detect that my headers line was too long and did split the header properly. *However* the subject was garbled when my mail reader put it back together. I am assuming this is because the string is double-byte and mb_encode_mimeheader() cut the string *inside* a double-byte. If this is the case then I would consider *that* a bug.


I'm not conversant in double-byte characters ... Did mb_encode_mimeheader() incorrectly cut the string inside a double-byte character?
 [2003-04-16 20:14 UTC] jc at mega-bucks dot co dot jp
Oh, and you said:

>Although mb_send_mail() seems to resolve your problem

How does mb_send_mail() seem to solve my problem? I cannot find anything in mb_send_mail() documentation that says it will act differently than mb_encode_mimeheader(). But it does solve the problem I will certainly use it :)
 [2003-04-16 20:36 UTC] moriyoshi@php.net
Hmm, yes, you are right and this is absolutely a problem that is caused by lack of words in the misleading documentation.

Besides I forgot to append the correct diagnosis of this problem.

mb_encode_mimeheader() can only deal with the most basic elements of strings. That is, it simply treats such a charcter that is represented by a numeric html entity as separate components.

Therefore if line folding is done in the middle of a entity compounded of several characters, it inserts CRLF & space there and ends up cluttering the structure of the html entity.

To avoid this, you have to do the following conversion on the string before passing it to mb_encode_mimeheader()

$subject = mb_convert_encoding($subject, "ISO-2022-JP", "HTML-ENTITIES");
var_dump($subject);
var_dump(mb_encode_mimeheader($subject, "ISO-2022-JP", "B"));

 [2003-04-16 20:45 UTC] jc at mega-bucks dot co dot jp
Thanks. Should I also submit a documentation bug then concerning mb_encode_mimeheader()?

I'll try out your solution. But what is "HTML-ENTITIES"? I cannot find any mention of that in the mb_string documenation?
 [2003-04-16 21:11 UTC] moriyoshi@php.net
Yep, you are always welcome; but I'd rather to see your mb reports such as this one at the php-i18n list first than at the bug database..


 [2003-04-16 21:21 UTC] jc at mega-bucks dot co dot jp
I want to make PHP better and of course make the work of the people in charge of checking bug reports easier so I have to ask:

Why doesn't this belong in a bug report?

From what I have understood from your comments, there is no bug in mb_encode_mimeheader() *but* there is a problem with the documentation. So there is a bug, not with the function but with the documentation no?

For me a bug is when a function does not behave as documented  or gives a "broken" result when given valid input.

In this case I gave valid input to the function (the RFC says what to do when the input string is longer than 76 chars) and the function gave back a broken header ...

At the very least there is bug in the docs because it does not state that mb_encode_mimeheader() is not guaranteed to work will double-byte strings that are too long?

I'll happilly submit "report" these to the i18n list instead, I just thought here was the better place since the function *is* broken and should be fixed or the documentation should fixed :)

Oh, and what is "HTML-ENTITIES"?
 [2003-04-17 08:14 UTC] moriyoshi@php.net
???! This is still a valid bug report and I've marked as assigned to indicate other volunteers that I'll fix the part of the document regarding this issue.

As for HTML-ENTITIES, I really feel & admit the insufficiency of docs and so I did register a new entry for it. See bug #23250.

 [2003-04-17 20:10 UTC] jc at mega-bucks dot co dot jp
Ok. Thanks for all the hard work. I'll keep an eye out for when bug #23250 is closed so that I can understand what HTML-ENTITIES does.

In the meantime I'll try out your solution and see if it works.
 [2003-07-29 19:57 UTC] gullevek at gullevek dot org
I have the same line break problem, but my source string is not HTML-ENCODED, it is plain ISO-2022-JP already, so I can't use $string=mb_encode_mimeheader(mb_convert_encoding($string, "ISO-2022-JP", "HTML-ENTITIES"),$encoding);
fact is, mb_encode_mimeheaders breaks the line withing an MB string, even it is an mb_ functions.
I have read through the whole thread here and I think, this function is buggy and has to be fixed ASAP.
At the moment you have to ship around with inserting a single byte whitespace AFTER the 18th multibyte or 36th single byte character (use mb_strimwidth eg).
 [2003-07-30 00:48 UTC] moriyoshi@php.net
gullevek at gullevek dot org:

If the string is really encoded in ISO-2022-JP already and you are trying to transform it into the MIME encoded form,

The code should be like the following:

mb_internal_encoding("ISO-2022-JP");
$string=mb_encode_mimeheader(#string, "ISO-2022-JP");

By contrast to other functions, mb_encode_mimeheader() doesn't take any parameter that specifies the encoding of the source string, if you have to explicitly pass it to the function, you should call mb_internal_encoding() followed by a mb_encode_mimeheader().


 [2004-03-15 15:28 UTC] moriyoshi@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.


 [2013-01-14 14:12 UTC] t dot glaser at tarent dot de
This is actually a user error:

$subject 
= "&#12304;tokyo-avland&#12305;&#20844;&#30340;&#35388;&#26126;&#26360;&#12398;&#36865;&#20184;&#26399;&#38480;&#12398;&#12362;&#30693;&#12425;&#12379;";
$subject = mb_encode_mimeheader($subject);

Correct it must be:

$subject = mb_encode_mimeheader("Subject: " . $subject);

I was puzzling about that for a long time, too, until I got it: where else is 
mb_encode_mimeheader supposed to know when to break from?
 [2020-02-07 06:12 UTC] phpdocbot@php.net
Automatic comment on behalf of moriyoshi
Revision: http://git.php.net/?p=doc/en.git;a=commit;h=5382c8f0653c8268c588c11a9bf3347d8e39abd9
Log: - Fix bug #23192
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Nov 24 16:01:31 2024 UTC