|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77607 mb_substr does not respect mb_substitution_character
Submitted: 2019-02-11 22:23 UTC Modified: 2019-02-23 06:14 UTC
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: legale dot legale at gmail dot com Assigned:
Status: Re-Opened Package: mbstring related
PHP Version: 7.3.2 OS: Linux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2019-02-11 22:23 UTC] legale dot legale at gmail dot com
Processing bad utf-16 strings leads to unexpected source data changes. 

Test script:
$bad_utf16be = pack("H*", "dc00dc00"); /* two 1 byte chars */
$good_utf16be = pack("H*", "d800dc00"); /* one 2 bytes char */
$utf16_string = $bad_utf16be . $good_utf16be;

/* take second and third char  */
$second = mb_substr($utf16_string, 1, 1, "UTF-16BE");
$third = mb_substr($utf16_string, 2, 1, "UTF-16BE");
echo "\nactual:\n";
echo "second UTF-16BE char: 0x". bin2hex($second) . PHP_EOL;
echo "third  UTF-16BE char: 0x". bin2hex($third) . PHP_EOL;

echo "\nexpected:\n";
echo "second UTF-16BE char: 0xdc00" . PHP_EOL;
echo "third  UTF-16BE char: 0xd800dc00" . PHP_EOL;

Expected result:
second UTF-16BE char: 0xdc00
third  UTF-16BE char: 0xd800dc00

Actual result:
second UTF-16BE char: 0x003f
third  UTF-16BE char: 0xd800dc00


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2019-02-12 09:56 UTC]
-Status: Open +Status: Not a bug
 [2019-02-12 09:56 UTC]
This looks correct to me. 0x003f is '?' which is the default substitution character (though I think we should change the default to U+FFFD instead). See also
 [2019-02-12 10:03 UTC]
-Summary: mbfl library bug +Summary: mb_substr does not respect mb_substitution_character -Status: Not a bug +Status: Re-Opened
 [2019-02-12 10:03 UTC]
What *is* buggy here though is that mb_substr() apparently does not respect mb_substitute_character() at all. In fact, I don't think any of the filters created inside libmbfl rather than mbstring do :(
 [2019-02-23 06:14 UTC]
The document says "This setting affects mb_convert_encoding(), mb_convert_variables(), mb_output_handler(), and mb_send_mail(). "

so I'll say it's expected.

It's another topic whether we want to apply mb_substitution_character to more functions.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Jun 17 22:01:28 2024 UTC