php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79238 preg_split returns weird results
Submitted: 2020-02-07 07:50 UTC Modified: 2020-02-07 08:00 UTC
From: vuongvankhanh89 at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 7.4.2 OS: Ubuntu 18.04
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: vuongvankhanh89 at gmail dot com
New email:
PHP Version: OS:

 

 [2020-02-07 07:50 UTC] vuongvankhanh89 at gmail dot com
Description:
------------
Hi, today i created a pattern to separate text to an array. 
Regex string is very simple:

preg_split('/\R| /m', '翶耆倈者耈翶耆倈傀堀X蔀耖耄')

it detects whitespace and linebreak as the separators and then converts string to array of strings. 

in the exam above there is no whitespace or linebreak but result i got is an array with 2 elements. 

array(2) {
  [0]=>
  string(11) "翶耆倈�"
  [1]=>
  string(28) "耈翶耆倈傀堀X蔀耖耄"
}

者 has been turned to a separator, it also becomes a weird charactor.

Kindly help! Thanks in advance.




Test script:
---------------
var_dump(preg_split('/\R| /m', '翶耆倈者耈翶耆倈傀堀X蔀耖耄'));

Expected result:
----------------
array(1) {
  [0]=>
  string(39) "翶耆倈者耈翶耆倈傀堀X蔀耖耄"
}

Actual result:
--------------
array(2) {
  [0]=>
  string(11) "翶耆倈�"
  [1]=>
  string(28) "耈翶耆倈傀堀X蔀耖耄"
}

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-07 08:00 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2020-02-07 08:00 UTC] requinix@php.net
You must use UTF-8 mode when working with UTF-8 patterns or inputs. https://3v4l.org/lI7FX
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 10:01:29 2024 UTC