php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #62462 quoted_printable_encode splits line in the middle of UTF8 character
Submitted: 2012-07-02 11:42 UTC Modified: 2013-02-18 00:35 UTC
Votes:5
Avg. Score:4.4 ± 0.8
Reproduced:3 of 5 (60.0%)
Same Version:1 (33.3%)
Same OS:3 (100.0%)
From: c2h5oh at poczta dot fm Assigned:
Status: No Feedback Package: *Mail Related
PHP Version: 5.3.14 OS: Linux
Private report: No CVE-ID: None
 [2012-07-02 11:42 UTC] c2h5oh at poczta dot fm
Description:
------------
quoted_printable_encode adds, among other things, soft line breaks if line lenght 
is greater than 76 characters. 
If that 76th character happens to be in the middle if encoded UTF8 character then 
this character will be split into two lines corrupting the encoded sting.



Test script:
---------------
<?php
echo quoted_printable_encode('ąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąąą');

Expected result:
----------------
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85

Actual result:
--------------
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=
=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=
=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=C4=85=
=C4=85=C4=85=C4=85


(compare ends of each line)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-07-15 02:31 UTC] stas@php.net
-Status: Open +Status: Feedback
 [2012-07-15 02:31 UTC] stas@php.net
Could you explain why soft line breaks is a problem? The software decoding the QP 
string should ignore the linebreaks and reassemble the string in the original 
form. Quoting from https://en.wikipedia.org/wiki/Quoted-printable:

A soft line break consists of an "=" at the end of an encoded line, and does not 
appear as a line break in the decoded text. 

So where the corruption of the utf-8 comes from?
 [2012-07-15 12:02 UTC] c2h5oh at poczta dot fm
-Status: Feedback +Status: Open
 [2012-07-15 12:02 UTC] c2h5oh at poczta dot fm
From RFC 2045:
"Because quoted-printable data is generally assumed to be line-
   oriented, it is to be expected that the representation of the breaks
   between the lines of quoted-printable data may be altered in
   transport, in the same manner that plain text mail has always been
   altered in Internet mail when passing between systems with differing
   newline conventions."

We have no guarantee that it will be possible to merge split character during 
decoding.

This has caused problems before: https://bugzilla.mozilla.org/show_bug.cgi?
id=684508
It's a problem widespread enough that for example SwiftMailer doesn't use 
quoted_printable_encode, but own PHP 
implementation which is more than an order of magnitude slower ( 
https://github.com/swiftmailer/swiftmailer/issues/220 ).

76 characters per line is the upper limit - adding soft line break earlier 
produces perfectly valid encoding that doesn't 
cause such problems.
 [2012-07-16 06:41 UTC] stas@php.net
The quoted bug is Thunderbird issue that produces real linebreak - note the 
absence of = at the end of line here:


=CE=B1=CE=B1=CE=B1=CE=B1=CE=B1=CE=B1=CE  <------- ERROR

instead of soft breaks. Clearly, this is a bug since the encoder should not 
insert hard line breaks. PHP produces soft breaks, which is in full compliance 
with the standard. 
I'm not sure also how RFC quote is relevant - if some clients/agents break the 
email text, we can't really be responsible for them and if they do that, they 
break QP encoding and something else - like base64 - should be used. I don't see 
however why it makes PHP's way of encoding be wrong.

Could you describe the scenario where the way PHP does things lead to something 
breaking, which is not due to bug in some other product? I also read swiftmailer 
report and also couldn't find description of any scenario where PHP encoding is 
wrong.
 [2012-07-16 06:41 UTC] stas@php.net
-Status: Open +Status: Feedback
 [2012-07-16 08:40 UTC] c2h5oh at poczta dot fm
-Status: Feedback +Status: Open
 [2012-07-16 08:40 UTC] c2h5oh at poczta dot fm
The important part is "quoted-printable data is generally assumed to be line-
oriented", I don't think we can or should assume that nothing happens to line-
breaks in transport.

As for the "if some clients/agents break the email text, we can't really be 
responsible for them and if they do that": the encoding after patch is still 
valid encoding and at the same time prevents "breaking" email text.

I have asked one of the people who made the decision to use own implementation 
of quoted_printable_encode in SwiftMailer instead of the function to update this 
ticket with the reasoning behind it.
 [2012-08-20 21:51 UTC] lstrojny@php.net
-Status: Open +Status: Feedback
 [2012-08-20 21:51 UTC] lstrojny@php.net
Please try using this snapshot:

  http://snaps.php.net/php5.4-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/


 [2013-02-18 00:35 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Sep 15 11:01:27 2024 UTC