|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79238 preg_split returns weird results
Submitted: 2020-02-07 07:50 UTC Modified: 2020-02-07 08:00 UTC
From: vuongvankhanh89 at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 7.4.2 OS: Ubuntu 18.04
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: vuongvankhanh89 at gmail dot com
New email:
PHP Version: OS:


 [2020-02-07 07:50 UTC] vuongvankhanh89 at gmail dot com
Hi, today i created a pattern to separate text to an array. 
Regex string is very simple:

preg_split('/\R| /m', '翶耆倈者耈翶耆倈傀堀X蔀耖耄')

it detects whitespace and linebreak as the separators and then converts string to array of strings. 

in the exam above there is no whitespace or linebreak but result i got is an array with 2 elements. 

array(2) {
  string(11) "翶耆倈�"
  string(28) "耈翶耆倈傀堀X蔀耖耄"

者 has been turned to a separator, it also becomes a weird charactor.

Kindly help! Thanks in advance.

Test script:
var_dump(preg_split('/\R| /m', '翶耆倈者耈翶耆倈傀堀X蔀耖耄'));

Expected result:
array(1) {
  string(39) "翶耆倈者耈翶耆倈傀堀X蔀耖耄"

Actual result:
array(2) {
  string(11) "翶耆倈�"
  string(28) "耈翶耆倈傀堀X蔀耖耄"


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-07 08:00 UTC]
-Status: Open +Status: Not a bug
 [2020-02-07 08:00 UTC]
You must use UTF-8 mode when working with UTF-8 patterns or inputs.
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Fri Jan 28 06:03:41 2022 UTC