php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #71959 preg_match_all: flag/retval to require or determine string fully consumed
Submitted: 2016-04-04 20:07 UTC Modified: 2019-03-18 16:38 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: stilezy at gmail dot com Assigned:
Status: Open Package: PCRE related
PHP Version: 5.6.20 OS:
Private report: No CVE-ID: None
 [2016-04-04 20:07 UTC] stilezy at gmail dot com
Description:
------------
Scenario:

An input or subject string is expected to be of the form /^(SOME_REGEX)+$/
If it matches then the substrings will be extracted using preg_match_all()

Example:  

$subject = '993A27C5502A3B';
and is expected to match $pattern = '/^(\d+[A-C])+$/ ';
(obviously here it does)

If it matches $pattern then the matches are extracted:
preg_match_all("/(\d+[A-C])/", $subject, $matches);

The weakness is that there isn't a way to stipulate or test whether preg_match_all() has consumed the entire subject string. So we have to first test with one preg_match() whether or not $subject matches a repetition of $pattern, then a second preg_match_all() to extract the matches (i.e. break down $subject) if it does.

This must be a very common scenario for preg_match_all, as a person matching all substrings may want to confirm whether the whole string is used.


REQUEST:

Can there be a flag added such as PREG_CONSUME_ALL that requires preg_match_all() to have consumed the entire string, and if it did not, returns FALSE, so that this double regex use isn't needed? Also perhaps a PREG_CONSUMED_ALL flag that causes the return value to indicate TRUE or FALSE depending whether or not the subject was fully consumed. I think this would be useful.

NOTES -
  1) By "consumed" I mean that the matches, imploded, make up the original string, ie natural meaning.
  2) Perhaps the same or similar enhancement might benefit preg_split() as well, and any other preg_*() functions that attempt to "walk" the subject and repeatedly act as often as possible upon it.



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-03-18 16:38 UTC] nikic@php.net
Some pointers:

A more efficient way to check whether everything was matched is to add up the lengths of the matches and check whether its equal to the string length.

Additionally the \G anchor can be used to avoid continuing matching if the matches aren't directly adjacent. So something like this:

$subject = '993A27C5502A7Q3B';
preg_match_all("/\G\d+[A-C]/", $subject, $matches);
var_dump($matches);

Will yield:

array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(4) "993A"
    [1]=>
    string(3) "27C"
    [2]=>
    string(5) "5502A"
  }
}

This gives you the matches until the first failure (at 7Q) and does not proceed past it.

Overall I don't think it's worthwhile to add such a flag. Yes, it would simplify the specific use-case where you're only interested in whether the entire string is consumed. However it would not cover small variations, like also wanting to print an error that includes the offset at which the match failed. You can easily do that with \G and PREG_OFFSET_CAPTURE, but a PREG_CONSUME_ALL flag would not help you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 11 22:01:27 2024 UTC