php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77607 mb_substr does not respect mb_substitution_character
Submitted: 2019-02-11 22:23 UTC Modified: 2019-02-23 06:14 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: legale dot legale at gmail dot com Assigned:
Status: Re-Opened Package: mbstring related
PHP Version: 7.3.2 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: legale dot legale at gmail dot com
New email:
PHP Version: OS:

 

 [2019-02-11 22:23 UTC] legale dot legale at gmail dot com
Description:
------------
Processing bad utf-16 strings leads to unexpected source data changes. 

Test script:
---------------
<?php
$bad_utf16be = pack("H*", "dc00dc00"); /* two 1 byte chars */
$good_utf16be = pack("H*", "d800dc00"); /* one 2 bytes char */
$utf16_string = $bad_utf16be . $good_utf16be;

/* take second and third char  */
$second = mb_substr($utf16_string, 1, 1, "UTF-16BE");
$third = mb_substr($utf16_string, 2, 1, "UTF-16BE");
echo "\nactual:\n";
echo "second UTF-16BE char: 0x". bin2hex($second) . PHP_EOL;
echo "third  UTF-16BE char: 0x". bin2hex($third) . PHP_EOL;

echo "\nexpected:\n";
echo "second UTF-16BE char: 0xdc00" . PHP_EOL;
echo "third  UTF-16BE char: 0xd800dc00" . PHP_EOL;



Expected result:
----------------
expected:
second UTF-16BE char: 0xdc00
third  UTF-16BE char: 0xd800dc00

Actual result:
--------------
actual:
second UTF-16BE char: 0x003f
third  UTF-16BE char: 0xd800dc00

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-02-12 09:56 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2019-02-12 09:56 UTC] nikic@php.net
This looks correct to me. 0x003f is '?' which is the default substitution character (though I think we should change the default to U+FFFD instead). See also http://php.net/manual/en/function.mb-substitute-character.php.
 [2019-02-12 10:03 UTC] nikic@php.net
-Summary: mbfl library bug +Summary: mb_substr does not respect mb_substitution_character -Status: Not a bug +Status: Re-Opened
 [2019-02-12 10:03 UTC] nikic@php.net
What *is* buggy here though is that mb_substr() apparently does not respect mb_substitute_character() at all. In fact, I don't think any of the filters created inside libmbfl rather than mbstring do :(
 [2019-02-23 06:14 UTC] jhdxr@php.net
The document says "This setting affects mb_convert_encoding(), mb_convert_variables(), mb_output_handler(), and mb_send_mail(). "

so I'll say it's expected.


It's another topic whether we want to apply mb_substitution_character to more functions.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Nov 25 10:01:32 2024 UTC