php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69217 mb_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code
Submitted: 2015-03-11 03:27 UTC Modified: 2021-08-23 15:59 UTC
From: salsi at icosaedro dot it Assigned:
Status: Open Package: mbstring related
PHP Version: Irrelevant OS: Slackware 14.1, Windows VISTA
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
MUST BE VALID
Solve the problem:
33 - 12 = ?
Subscribe to this entry?

 
 [2015-03-11 03:27 UTC] salsi at icosaedro dot it
Description:
------------
mb_convert_encoding() does not recognize invalid bytes from ill formed ASCII and ISO-8859-1 encoded binary string. It seems that this 0x80 byte be blindly converted to its corresponding 2-bytes UTF-8 sequence 0xC2 0x80. Instead, converting from UTF-8 in itself the error is detected.

Although not stated anywhere, mb_convert_encoding() should guarantee that either the resulting output string be perfectly compliant with the requested expected encoding, or it must fail with error. Returning unexpected results may have safety and security consequences.

Tested on:
- PHP 5.7.0-dev Slackware Linux 14.1
- PHP 5.5.16 Windows VISTA


Test script:
---------------
<?php
error_reporting(PHP_INT_MAX);
ini_set("mbstring.substitute_character", (string) ord("X"));
ini_set("mbstring.strict_detection", "1"); // no effect
// bytes 0x80-0xff are not valid ASCII:
echo 1, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "ASCII")) . "\n";
// bytes 0x80-0x9f are not valid ISO-8859-1:
echo 2, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "ISO-8859-1")) . "\n";
echo 3, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "UTF-8")) . "\n";


Expected result:
----------------
1X
2X
3X


Actual result:
--------------
1%C2%80
2%C2%80
3X


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-03-28 20:51 UTC] yohgaki@php.net
-Summary: b_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code +Summary: mb_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code
 [2020-11-05 18:16 UTC] denn at vollbio dot de
Similar issue:

mb_convert_encoding("\x81", 'UTF-8', 'cp1252')

returns the bytes 

ef bf be

which is invalid UTF-8: http://www.fileformat.info/info/unicode/char/ffff/index.htm
 [2021-04-14 13:14 UTC] cmb@php.net
> Similar issue:

No, not really.  And I can't reproduce the EFBFBE either.  I'm
getting EFBFBD instead[1], which is the REPLACEMENT CHARACTER, and
that is correct.  Anyhow, if you're still experiencing the
reported behavior, please open a separate ticket.

[1] <https://3v4l.org/XM7Db>
 [2021-08-23 15:59 UTC] cmb@php.net
The ASCII conversion will be fixed in PHP 8.1.0[1]; I'm not sure
about the ISO-8859-1 conversion, because that encoding is often
erroneously used when actually Windows CP 1252 is meant.

[1] <https://3v4l.org/nCgbh/rfc#output>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 02:01:30 2024 UTC