|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2018-09-30 18:16 UTC] cmb@php.net
-Package: Unknown/Other Function
+Package: Strings related
[2018-09-30 18:16 UTC] cmb@php.net
[2021-03-04 14:01 UTC] cmb@php.net
-Status: Open
+Status: Wont fix
-Assigned To:
+Assigned To: cmb
[2021-03-04 14:01 UTC] cmb@php.net
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sat Nov 01 17:00:02 2025 UTC |
Description: ------------ Hi there, I have some very long strings (they come from a MySQL query selecting multiple GROUP_CONCAT-enated fields) that I would like to tokenize quickly using the strtok function, but I cannot do that in parallel as I need. That is fetching the first token from all strings, then the second token, etc. E.g. given: $a = '1,2,3,4'; $b = 'a,b,c,d'; strtok($a, ',') returns the first token of $a. If I then do strtok($b, ',') it will return the first token of $b, so I have '1' and 'a' together for the first iteration. In the next iteration I would need '2' and 'b', etc, but strtok(',') will only give me 'b', there is no way to fetch '2' from $a, since strtok can only be used on one string at a time. I know I could do it implementing my own algorithm that accesses strings character by character, but that would be slow. I have tried using explode(',', $a/$b) but that uses too much memory, since it generates very big intermediate arrays, my strings are very long and numerous. Would it be possible to have a tokenizer facility that does not store the string to tokenize, but only the index of the last processed character, so that it can be used on multiple strings? I know that maybe I shouldn't use PHP for such tasks but rather code in C, but PHP is so quick to develop with that I cannot resist :) I understand that strtok has been engineered this way to avoid passing the string over and over, which would be slow for long strings due to pass by value, therefore I would like to have another tokenizer function that accepts a reference to a string and always two arguments.