|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2015-03-11 03:27 UTC] salsi at icosaedro dot it
Description:
------------
mb_convert_encoding() does not recognize invalid bytes from ill formed ASCII and ISO-8859-1 encoded binary string. It seems that this 0x80 byte be blindly converted to its corresponding 2-bytes UTF-8 sequence 0xC2 0x80. Instead, converting from UTF-8 in itself the error is detected.
Although not stated anywhere, mb_convert_encoding() should guarantee that either the resulting output string be perfectly compliant with the requested expected encoding, or it must fail with error. Returning unexpected results may have safety and security consequences.
Tested on:
- PHP 5.7.0-dev Slackware Linux 14.1
- PHP 5.5.16 Windows VISTA
Test script:
---------------
<?php
error_reporting(PHP_INT_MAX);
ini_set("mbstring.substitute_character", (string) ord("X"));
ini_set("mbstring.strict_detection", "1"); // no effect
// bytes 0x80-0xff are not valid ASCII:
echo 1, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "ASCII")) . "\n";
// bytes 0x80-0x9f are not valid ISO-8859-1:
echo 2, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "ISO-8859-1")) . "\n";
echo 3, rawurlencode(mb_convert_encoding("\x80", "UTF-8", "UTF-8")) . "\n";
Expected result:
----------------
1X
2X
3X
Actual result:
--------------
1%C2%80
2%C2%80
3X
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Dec 21 06:00:02 2025 UTC |
Similar issue: mb_convert_encoding("\x81", 'UTF-8', 'cp1252') returns the bytes ef bf be which is invalid UTF-8: http://www.fileformat.info/info/unicode/char/ffff/index.htm