php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #66828 iconv_mime_encode Q-encoding longer than it should be
Submitted: 2014-03-05 19:04 UTC Modified: 2018-09-22 14:09 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:0 of 1 (0.0%)
From: st_9876543210 at yahoo dot de Assigned: cmb (profile)
Status: Closed Package: ICONV related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
 [2014-03-05 19:04 UTC] st_9876543210 at yahoo dot de
Description:
------------
Instead of wrapping a whole quotes-printabled line into =?UTF-8?Q? and ?=, iconv_mime_encode() sometimes splits a line into 2 parts each surrounded by the mentioned parts. While this seams to be still standard-compliant it makes the resulting string longer than it should be.
My example would perfectly fit into one line but instead it needs 2 lines.

This also happens when having longer strings (having this issue on every line of the output!) and regardless of what characters are used.

This could to be caused by the bug fix for https://bugs.php.net/bug.php?id=48289.

Test script:
---------------
<?php
$preferences = array(
    "input-charset" => "ISO-8859-1",
    "output-charset" => "UTF-8",
    "line-length" => 76,
    "line-break-chars" => "\n",
    "scheme" => "Q"
);
var_dump(iconv_mime_encode("Subject", "Test Test Test Test Test Test Test Test", $preferences));

Expected result:
----------------
string(67) "Subject: =?UTF-8?Q?Test=20Test=20Test=20Test=20Test=20Test=20Test?="

Actual result:
--------------
string(93) "Subject: =?UTF-8?Q?Test=20Test=20Test=20Tes?==?UTF-8?Q?t=20Test?=
 =?UTF-8?Q?=20Test=20Test?="

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-05-23 08:30 UTC] php at pointpro dot nl
I went digging in the source code in ext/iconv/iconv.c.

I agree that it results in much longer strings than necessary, the overhead of the charset marker is high.

The issue is that the number of remaining characters is divided by 3, the maximum amount of bytes needed to encode any 8-bit character. If the string to-be encoded consists of mainly or solely ASCII-characters, this results in a encoded word that takes up only one-third of the available characters.

It is probably a bit more complex than that, but it seems to me that the division by 3 is not necessary - you could just use the available remaining characters as is: the loop will calculate the actual number of characters required for the encoded word, and if this exceeds the amount of remaining character, the amount of remaining characters is reduces to compensate for this. The downside is that for strings with lots of non-ASCII codes, it will take several more loops to encode it, so that the performance is decreased slightly.

A better approach would be to consider all input bytes one-by-one and determine the amount of bytes required to encode that, and continue adding more characters until the available space is filled.
 [2018-08-12 13:17 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2018-08-12 13:17 UTC] cmb@php.net
> While this seams to be still standard-compliant […]

I'm not sure whether multiple encoded words on the same line
are standard compliant.  RFC 2047 states[1]:

| If it is desirable to encode more text than will fit in an
| 'encoded-word' of 75 characters, multiple 'encoded-word's
| (separated by CRLF SPACE) may be used.

> […] but it seems to me that the division by 3 is not necessary
> […]

Simply removing the `/ 3`[2] causes bug48289.phpt[3] to hang.

> A better approach would be to consider all input bytes
> one-by-one and determine the amount of bytes required to encode
> that, and continue adding more characters until the available
> space is filled.

It seems to me that the most appropriate solution would be to
convert the complete input to the desired output charset, and to
apply the encoding afterwards.

[1] <https://tools.ietf.org/html/rfc2047#section-2>
[2] <https://github.com/php/php-src/blob/php-7.3.0beta1/ext/iconv/iconv.c#L1359>
[3] <https://github.com/php/php-src/blob/php-7.3.0beta1/ext/iconv/tests/bug48289.phpt>
 [2018-08-25 12:29 UTC] cmb@php.net
-Summary: iconv_mime_encode quoted-printable result longer than it should be +Summary: iconv_mime_encode() fails to Q-encode UTF-8 string -Assigned To: +Assigned To: cmb
 [2018-08-25 12:32 UTC] cmb@php.net
-Summary: iconv_mime_encode() fails to Q-encode UTF-8 string +Summary: iconv_mime_encode quoted-printable result longer than it should be -Assigned To: cmb +Assigned To:
 [2018-08-25 12:32 UTC] cmb@php.net
Oops, wrong bug.
 [2018-09-04 13:31 UTC] cmb@php.net
-Summary: iconv_mime_encode quoted-printable result longer than it should be +Summary: iconv_mime_encode Q-encoding longer than it should be -Status: Verified +Status: Analyzed -Assigned To: +Assigned To: cmb
 [2018-09-04 13:31 UTC] cmb@php.net
> It seems to me that the most appropriate solution would be to
> convert the complete input to the desired output charset, and to
> apply the encoding afterwards.

No, that can't work, since RFC 2047 specifies[1]:

| Each 'encoded-word' MUST represent an integral number of
| characters.  A multi-octet character may not be split across
| adjacent 'encoded-word's.

> Simply removing the `/ 3`[2] causes bug48289.phpt[3] to hang.

Indeed.  That is because the out_size isn't changed[2] in this
case, since the +1 is not enough to enforce the division by 3 to
be greater than zero; we need to add 2 instead.

[1] <https://tools.ietf.org/html/rfc2047#page-8>
[2] <https://github.com/php/php-src/blob/php-7.3.0beta3/ext/iconv/iconv.c#L1424>
 [2018-09-04 14:18 UTC] cmb@php.net
-Assigned To: cmb +Assigned To:
 [2018-09-04 14:18 UTC] cmb@php.net
<https://github.com/php/php-src/pull/3492> is supposed to fix this
issue.
 [2018-09-22 14:09 UTC] cmb@php.net
Automatic comment on behalf of cmbecker69@gmx.de
Revision: http://git.php.net/?p=php-src.git;a=commit;h=9cbe1283f70699b82ca4225705d40fcc73633dfb
Log: Fix #66828: iconv_mime_encode Q-encoding longer than it should be
 [2018-09-22 14:09 UTC] cmb@php.net
-Status: Analyzed +Status: Closed
 [2018-09-22 14:09 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 09:01:32 2024 UTC