php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77557 mb_regex_replace does not handle backreferences if regex-encoding is UTF-32
Submitted: 2019-02-02 02:52 UTC Modified: 2019-02-11 16:33 UTC
Votes:4
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: codes at designsandcodes dot com Assigned:
Status: Verified Package: mbstring related
PHP Version: 7.2 OS: Windows 10, Ubuntu 16.04
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: codes at designsandcodes dot com
New email:
PHP Version: OS:

 

 [2019-02-02 02:52 UTC] codes at designsandcodes dot com
Description:
------------
Per summary, backreferences are not properly replaced if the regex encoding is any of the UTF-32 flavors (BE, LE, UCS-4).  All 3 parameters (pattern, replace, subject) are being presented in the correct UTF-32 flavor; replace is being returned unmodified.

Test script:
---------------
$regexEncoding = 'UTF-32';
mb_regex_encoding( $regexEncoding );
var_export( mb_ereg_replace( mb_convert_encoding( '.+', $regexEncoding ), mb_convert_encoding( "the \\0", $regexEncoding ), mb_convert_encoding( 'dog', $regexEncoding ) ) );

Expected result:
----------------
'' . "\0" . '' . "\0" . '' . "\0" . 't' . "\0" . '' . "\0" . '' . "\0" . 'h' . "\0" . '' . "\0" . '' . "\0" . 'e' . "\0" . '' . "\0" . '' . "\0" . ' ' . "\0" . '' . "\0" . '' . "\0" . 'd' . "\0" . '' . "\0" . '' . "\0" . 'o' . "\0" . '' . "\0" . '' . "\0" . 'g'

e.g., "the dog" in UTF-32 encoding

Actual result:
--------------
'' . "\0" . '' . "\0" . '' . "\0" . 't' . "\0" . '' . "\0" . '' . "\0" . 'h' . "\0" . '' . "\0" . '' . "\0" . 'e' . "\0" . '' . "\0" . '' . "\0" . ' ' . "\0" . '' . "\0" . '' . "\0" . '\\' . "\0" . '' . "\0" . '' . "\0" . '0'

e.g., "\\0" in UTF-32 encoding

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-02-02 19:05 UTC] cmb@php.net
-Status: Open +Status: Verified -PHP Version: 7.3.1 +PHP Version: 7.2
 [2019-02-02 19:05 UTC] cmb@php.net
This issue is not PHP 7.3 specific[1].

[1] <https://3v4l.org/KqU5d>
 [2019-02-11 16:33 UTC] nikic@php.net
Looks like the code assumes that the encoding is ASCII-compatible (like UTF-8). There might not be a simple fix for this one.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Nov 29 01:01:28 2024 UTC