php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #77459 Add partial match support to preg_match
Submitted: 2019-01-15 08:53 UTC Modified: 2019-03-20 09:24 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: paul dot crovella at gmail dot com Assigned: nikic (profile)
Status: Assigned Package: PCRE related
PHP Version: Next Minor Version OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: paul dot crovella at gmail dot com
New email:
PHP Version: OS:

 

 [2019-01-15 08:53 UTC] paul dot crovella at gmail dot com
Description:
------------
Please add support for PCRE2 partial matching[1] to preg_match.

A partial match is one that occurs when a subject matches a regex as far as it goes but ends before a full match, or sure failure to match, can be determined. The regex /abc/ would normally fail to match 'ab', but with one of the partial flags given (see below) it'd instead return a partial match. This is useful to indicate that more data needs to be read into a buffer before trying the match again. (See: multi-segment matching[2].)

I imagine this would come about as PREG_PARTIAL_SOFT and PREG_PARTIAL_HARD flags (neither on by default) corresponding to PCRE2_PARTIAL_SOFT and PCRE2_PARTIAL_HARD, respectively, which would pass through as options to pcre2_match and pcre2_jit_match. pcre2_jit_compile would get PCRE2_JIT_PARTIAL_SOFT or PCRE2_JIT_PARTIAL_HARD as appropriate in addition to the usual PCRE2_JIT_COMPLETE.

When a partial match is found preg_match should return a new value (2?) in order to distinguish it, and the $matches array (if given) should be filled with a single member consisting of the string participating in partial match (and possibly a little more.) This should be from the beginning of the matched data minus the length of the longest lookbehind* through to the end of the subject string. For example:

/baz(?<=barbaz)(?=quux)/

applied to:

abcfoobarbazqu

should result in foobarbazqu in the $matches array. This ensures enough of the string is kept for further match attempts to have a chance to succeed. (Fortunately PCRE ignores \K in a partial match, so there's no need to worry about it.) Another option that comes to mind would be to just give the offset of this substring and let the user chop it off or pass it as the offset argument of preg_match as desired.

---

* Available from pcre2_pattern_info given PCRE2_INFO_MAXLOOKBEHIND. See [3] for details.

[1] https://www.pcre.org/current/doc/html/pcre2partial.html
[2] https://www.pcre.org/current/doc/html/pcre2partial.html#SEC7
[3] https://www.pcre.org/current/doc/html/pcre2partial.html#SEC8


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-03-19 16:49 UTC] nikic@php.net
-Status: Open +Status: Assigned -Assigned To: +Assigned To: nikic
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 14 16:01:26 2024 UTC