php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77557 mb_regex_replace does not handle backreferences if regex-encoding is UTF-32
Submitted: 2019-02-02 02:52 UTC Modified: 2019-02-11 16:33 UTC
Votes:4
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: codes at designsandcodes dot com Assigned:
Status: Verified Package: mbstring related
PHP Version: 7.2 OS: Windows 10, Ubuntu 16.04
Private report: No CVE-ID: None
 [2019-02-02 02:52 UTC] codes at designsandcodes dot com
Description:
------------
Per summary, backreferences are not properly replaced if the regex encoding is any of the UTF-32 flavors (BE, LE, UCS-4).  All 3 parameters (pattern, replace, subject) are being presented in the correct UTF-32 flavor; replace is being returned unmodified.

Test script:
---------------
$regexEncoding = 'UTF-32';
mb_regex_encoding( $regexEncoding );
var_export( mb_ereg_replace( mb_convert_encoding( '.+', $regexEncoding ), mb_convert_encoding( "the \\0", $regexEncoding ), mb_convert_encoding( 'dog', $regexEncoding ) ) );

Expected result:
----------------
'' . "\0" . '' . "\0" . '' . "\0" . 't' . "\0" . '' . "\0" . '' . "\0" . 'h' . "\0" . '' . "\0" . '' . "\0" . 'e' . "\0" . '' . "\0" . '' . "\0" . ' ' . "\0" . '' . "\0" . '' . "\0" . 'd' . "\0" . '' . "\0" . '' . "\0" . 'o' . "\0" . '' . "\0" . '' . "\0" . 'g'

e.g., "the dog" in UTF-32 encoding

Actual result:
--------------
'' . "\0" . '' . "\0" . '' . "\0" . 't' . "\0" . '' . "\0" . '' . "\0" . 'h' . "\0" . '' . "\0" . '' . "\0" . 'e' . "\0" . '' . "\0" . '' . "\0" . ' ' . "\0" . '' . "\0" . '' . "\0" . '\\' . "\0" . '' . "\0" . '' . "\0" . '0'

e.g., "\\0" in UTF-32 encoding

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-02-02 19:05 UTC] cmb@php.net
-Status: Open +Status: Verified -PHP Version: 7.3.1 +PHP Version: 7.2
 [2019-02-02 19:05 UTC] cmb@php.net
This issue is not PHP 7.3 specific[1].

[1] <https://3v4l.org/KqU5d>
 [2019-02-11 16:33 UTC] nikic@php.net
Looks like the code assumes that the encoding is ASCII-compatible (like UTF-8). There might not be a simple fix for this one.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Oct 11 00:01:28 2024 UTC