|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69217 mb_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code
Submitted: 2015-03-11 03:27 UTC Modified: 2015-03-28 20:51 UTC
From: salsi at icosaedro dot it Assigned:
Status: Open Package: mbstring related
PHP Version: Irrelevant OS: Slackware 14.1, Windows VISTA
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2015-03-11 03:27 UTC] salsi at icosaedro dot it
mb_convert_encoding() does not recognize invalid bytes from ill formed ASCII and ISO-8859-1 encoded binary string. It seems that this 0x80 byte be blindly converted to its corresponding 2-bytes UTF-8 sequence 0xC2 0x80. Instead, converting from UTF-8 in itself the error is detected.

Although not stated anywhere, mb_convert_encoding() should guarantee that either the resulting output string be perfectly compliant with the requested expected encoding, or it must fail with error. Returning unexpected results may have safety and security consequences.

Tested on:
- PHP 5.7.0-dev Slackware Linux 14.1
- PHP 5.5.16 Windows VISTA

Test script:
ini_set("mbstring.substitute_character", (string) ord("X"));
ini_set("mbstring.strict_detection", "1"); // no effect
// bytes 0x80-0xff are not valid ASCII:
echo 1, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "ASCII")) . "\n";
// bytes 0x80-0x9f are not valid ISO-8859-1:
echo 2, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "ISO-8859-1")) . "\n";
echo 3, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "UTF-8")) . "\n";

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2015-03-28 20:51 UTC]
-Summary: b_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code +Summary: mb_convert_encoding() passes 0x80 as valid ASCII and ISO-8859-1 code
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Sat Oct 24 04:01:23 2020 UTC