php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #23904 PREG_SPLIT_NO_EMPTY causes ctl chars to not be excluded with [^a-zA-Z0-9] or \W
Submitted: 2003-05-30 13:10 UTC Modified: 2003-06-05 08:43 UTC
From: jrose at lgb-inc dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 4.3.1 OS: Linux
Private report: No CVE-ID: None
 [2003-05-30 13:10 UTC] jrose at lgb-inc dot com
I slammed into this whilst converting Access data into MySQL.

I was attempting to break apart the following into words (* here indicates that it fails to match /[\w,.-?]/):

|*|W|a|s|h|i|n|g|t|o|n|,|*|D|C|*|*     // text
057617368696e67746f6e2c20444320200     // hex

I first tried splitting on any white space or commas:

preg_match( '/[\\s,]+/', $string, PREG_SPLIT_NO_EMPTY );

When this didn't work, I examined the hexadecimal values as above, and, assuming that control characters weren't included in the \s group, tried several things, including the very simple:

preg_match( '/\\W/', $string, PREG_SPLIT_NO_EMPTY );

Nothing worked, and ultimately I had to use preg_match_all() to split the string up.

Example:

$string = chr(5) . 'Washington,' . chr(4) . 'DC' . chr(2);
$parts = preg_split( '/\\W/', $string, PREG_SPLIT_NO_EMPTY );
echo join('|', $parts);

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-06-05 08:43 UTC] andrei@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Please read the function documentation. The third parameter is a maximum number of pieces to be returned, not the flags. Try:

$parts = preg_split( '/\\W/', $string, -1, PREG_SPLIT_NO_EMPTY );
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Sat Jan 29 08:03:34 2022 UTC