php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69217 mb_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code
Submitted: 2015-03-11 03:27 UTC Modified: 2021-08-23 15:59 UTC
From: salsi at icosaedro dot it Assigned:
Status: Open Package: mbstring related
PHP Version: Irrelevant OS: Slackware 14.1, Windows VISTA
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: salsi at icosaedro dot it
New email:
PHP Version: OS:

 

 [2015-03-11 03:27 UTC] salsi at icosaedro dot it
Description:
------------
mb_convert_encoding() does not recognize invalid bytes from ill formed ASCII and ISO-8859-1 encoded binary string. It seems that this 0x80 byte be blindly converted to its corresponding 2-bytes UTF-8 sequence 0xC2 0x80. Instead, converting from UTF-8 in itself the error is detected.

Although not stated anywhere, mb_convert_encoding() should guarantee that either the resulting output string be perfectly compliant with the requested expected encoding, or it must fail with error. Returning unexpected results may have safety and security consequences.

Tested on:
- PHP 5.7.0-dev Slackware Linux 14.1
- PHP 5.5.16 Windows VISTA


Test script:
---------------
<?php
error_reporting(PHP_INT_MAX);
ini_set("mbstring.substitute_character", (string) ord("X"));
ini_set("mbstring.strict_detection", "1"); // no effect
// bytes 0x80-0xff are not valid ASCII:
echo 1, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "ASCII")) . "\n";
// bytes 0x80-0x9f are not valid ISO-8859-1:
echo 2, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "ISO-8859-1")) . "\n";
echo 3, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "UTF-8")) . "\n";


Expected result:
----------------
1X
2X
3X


Actual result:
--------------
1%C2%80
2%C2%80
3X


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-03-28 20:51 UTC] yohgaki@php.net
-Summary: b_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code +Summary: mb_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code
 [2020-11-05 18:16 UTC] denn at vollbio dot de
Similar issue:

mb_convert_encoding("\x81", 'UTF-8', 'cp1252')

returns the bytes 

ef bf be

which is invalid UTF-8: http://www.fileformat.info/info/unicode/char/ffff/index.htm
 [2021-04-14 13:14 UTC] cmb@php.net
> Similar issue:

No, not really.  And I can't reproduce the EFBFBE either.  I'm
getting EFBFBD instead[1], which is the REPLACEMENT CHARACTER, and
that is correct.  Anyhow, if you're still experiencing the
reported behavior, please open a separate ticket.

[1] <https://3v4l.org/XM7Db>
 [2021-08-23 15:59 UTC] cmb@php.net
The ASCII conversion will be fixed in PHP 8.1.0[1]; I'm not sure
about the ISO-8859-1 conversion, because that encoding is often
erroneously used when actually Windows CP 1252 is meant.

[1] <https://3v4l.org/nCgbh/rfc#output>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 01:01:28 2024 UTC