php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79238 preg_split returns weird results
Submitted: 2020-02-07 07:50 UTC Modified: 2020-02-07 08:00 UTC
From: vuongvankhanh89 at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 7.4.2 OS: Ubuntu 18.04
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: vuongvankhanh89 at gmail dot com
New email:
PHP Version: OS:

 

 [2020-02-07 07:50 UTC] vuongvankhanh89 at gmail dot com
Description:
------------
Hi, today i created a pattern to separate text to an array. 
Regex string is very simple:

preg_split('/\R| /m', '翶耆倈者耈翶耆倈傀堀X蔀耖耄')

it detects whitespace and linebreak as the separators and then converts string to array of strings. 

in the exam above there is no whitespace or linebreak but result i got is an array with 2 elements. 

array(2) {
  [0]=>
  string(11) "翶耆倈�"
  [1]=>
  string(28) "耈翶耆倈傀堀X蔀耖耄"
}

者 has been turned to a separator, it also becomes a weird charactor.

Kindly help! Thanks in advance.




Test script:
---------------
var_dump(preg_split('/\R| /m', '翶耆倈者耈翶耆倈傀堀X蔀耖耄'));

Expected result:
----------------
array(1) {
  [0]=>
  string(39) "翶耆倈者耈翶耆倈傀堀X蔀耖耄"
}

Actual result:
--------------
array(2) {
  [0]=>
  string(11) "翶耆倈�"
  [1]=>
  string(28) "耈翶耆倈傀堀X蔀耖耄"
}

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-07 08:00 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2020-02-07 08:00 UTC] requinix@php.net
You must use UTF-8 mode when working with UTF-8 patterns or inputs. https://3v4l.org/lI7FX
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Fri Jan 28 06:03:41 2022 UTC