php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #31824 strtok() improvement suggestion
Submitted: 2005-02-03 06:05 UTC Modified: 2005-03-20 01:00 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: dpisarev at rogers dot com Assigned:
Status: No Feedback Package: Feature/Change Request
PHP Version: 5.0.3 OS: All
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: dpisarev at rogers dot com
New email:
PHP Version: OS:

 

 [2005-02-03 06:05 UTC] dpisarev at rogers dot com
Description:
------------
The existing string-tokenizing function (strtok) could be improved by being made to take an additional parameter that would tell the function to treat multiple adjacent delimiters as one for the purposes of splitting up the string.

It appears that "preg_split" can accomplish the task but that would involve concocting the right regular expressions -- which is a pain that most people would prefer to avoid. Besides, the chatter on the documentation threads is that messing with regular expressions slows down execution.

Expected result:
----------------
Say, we have the string (omit the quotes):

"abc@def.com,   ghi@jkl.com mno@pqr.com;   stu@vw.com"

And suppose we want to parse it to extract the email addresses (imagine we have a mailing list system where we add people's addresses in bulk with all sorts of crazy formatting in between the addresses that we don't have the time to neaten up).

So, the output of tokenizing the above string by the delimiters: ", ;" should in my vision be an array of just FOUR elements, namely:

abc@def.com
ghi@jkl.com
mno@pqr.com
stu@vw.com

It is easy to see that one can find a lot of uses for this particular tokenization algorithm.

Actual result:
--------------
The way things are now with strtok, I'd get an array of more than the above-listed 4 elements. In addition to them I'd get a whole bunch of empty elements in the array corresponding to all occurences of two delimiters occuring side-by-side in the string to be tokenized.

Please feel free to concact me for any clarifications. In case the improvement suggestion is integrated into PHP, credits will be welcomed.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-03-12 13:36 UTC] andrey@php.net
Do you mean split() instead of strtok()?
 [2005-03-20 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Oct 31 22:01:27 2024 UTC