php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69217 mb_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code
Submitted: 2015-03-11 03:27 UTC Modified: 2021-08-23 15:59 UTC
From: salsi at icosaedro dot it Assigned:
Status: Open Package: mbstring related
PHP Version: Irrelevant OS: Slackware 14.1, Windows VISTA
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: salsi at icosaedro dot it
New email:
PHP Version: OS:

 

 [2015-03-11 03:27 UTC] salsi at icosaedro dot it
Description:
------------
mb_convert_encoding() does not recognize invalid bytes from ill formed ASCII and ISO-8859-1 encoded binary string. It seems that this 0x80 byte be blindly converted to its corresponding 2-bytes UTF-8 sequence 0xC2 0x80. Instead, converting from UTF-8 in itself the error is detected.

Although not stated anywhere, mb_convert_encoding() should guarantee that either the resulting output string be perfectly compliant with the requested expected encoding, or it must fail with error. Returning unexpected results may have safety and security consequences.

Tested on:
- PHP 5.7.0-dev Slackware Linux 14.1
- PHP 5.5.16 Windows VISTA


Test script:
---------------
<?php
error_reporting(PHP_INT_MAX);
ini_set("mbstring.substitute_character", (string) ord("X"));
ini_set("mbstring.strict_detection", "1"); // no effect
// bytes 0x80-0xff are not valid ASCII:
echo 1, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "ASCII")) . "\n";
// bytes 0x80-0x9f are not valid ISO-8859-1:
echo 2, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "ISO-8859-1")) . "\n";
echo 3, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "UTF-8")) . "\n";


Expected result:
----------------
1X
2X
3X


Actual result:
--------------
1%C2%80
2%C2%80
3X


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-03-28 20:51 UTC] yohgaki@php.net
-Summary: b_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code +Summary: mb_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code
 [2020-11-05 18:16 UTC] denn at vollbio dot de
Similar issue:

mb_convert_encoding("\x81", 'UTF-8', 'cp1252')

returns the bytes 

ef bf be

which is invalid UTF-8: http://www.fileformat.info/info/unicode/char/ffff/index.htm
 [2021-04-14 13:14 UTC] cmb@php.net
> Similar issue:

No, not really.  And I can't reproduce the EFBFBE either.  I'm
getting EFBFBD instead[1], which is the REPLACEMENT CHARACTER, and
that is correct.  Anyhow, if you're still experiencing the
reported behavior, please open a separate ticket.

[1] <https://3v4l.org/XM7Db>
 [2021-08-23 15:59 UTC] cmb@php.net
The ASCII conversion will be fixed in PHP 8.1.0[1]; I'm not sure
about the ISO-8859-1 conversion, because that encoding is often
erroneously used when actually Windows CP 1252 is meant.

[1] <https://3v4l.org/nCgbh/rfc#output>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 11 23:01:27 2024 UTC