php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77607 mb_substr does not respect mb_substitution_character
Submitted: 2019-02-11 22:23 UTC Modified: 2019-02-23 06:14 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: legale dot legale at gmail dot com Assigned:
Status: Re-Opened Package: mbstring related
PHP Version: 7.3.2 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: legale dot legale at gmail dot com
New email:
PHP Version: OS:

 

 [2019-02-11 22:23 UTC] legale dot legale at gmail dot com
Description:
------------
Processing bad utf-16 strings leads to unexpected source data changes. 

Test script:
---------------
<?php
$bad_utf16be = pack("H*", "dc00dc00"); /* two 1 byte chars */
$good_utf16be = pack("H*", "d800dc00"); /* one 2 bytes char */
$utf16_string = $bad_utf16be . $good_utf16be;

/* take second and third char  */
$second = mb_substr($utf16_string, 1, 1, "UTF-16BE");
$third = mb_substr($utf16_string, 2, 1, "UTF-16BE");
echo "\nactual:\n";
echo "second UTF-16BE char: 0x". bin2hex($second) . PHP_EOL;
echo "third  UTF-16BE char: 0x". bin2hex($third) . PHP_EOL;

echo "\nexpected:\n";
echo "second UTF-16BE char: 0xdc00" . PHP_EOL;
echo "third  UTF-16BE char: 0xd800dc00" . PHP_EOL;



Expected result:
----------------
expected:
second UTF-16BE char: 0xdc00
third  UTF-16BE char: 0xd800dc00

Actual result:
--------------
actual:
second UTF-16BE char: 0x003f
third  UTF-16BE char: 0xd800dc00

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-02-12 09:56 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2019-02-12 09:56 UTC] nikic@php.net
This looks correct to me. 0x003f is '?' which is the default substitution character (though I think we should change the default to U+FFFD instead). See also http://php.net/manual/en/function.mb-substitute-character.php.
 [2019-02-12 10:03 UTC] nikic@php.net
-Summary: mbfl library bug +Summary: mb_substr does not respect mb_substitution_character -Status: Not a bug +Status: Re-Opened
 [2019-02-12 10:03 UTC] nikic@php.net
What *is* buggy here though is that mb_substr() apparently does not respect mb_substitute_character() at all. In fact, I don't think any of the filters created inside libmbfl rather than mbstring do :(
 [2019-02-23 06:14 UTC] jhdxr@php.net
The document says "This setting affects mb_convert_encoding(), mb_convert_variables(), mb_output_handler(), and mb_send_mail(). "

so I'll say it's expected.


It's another topic whether we want to apply mb_substitution_character to more functions.
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Thu Aug 22 07:01:26 2019 UTC