php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #61018 Unexplained bool(false) returned from preg_match
Submitted: 2012-02-08 18:43 UTC Modified: 2013-03-14 07:06 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: dey101+php at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.3.10 OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: dey101+php at gmail dot com
New email:
PHP Version: OS:

 

 [2012-02-08 18:43 UTC] dey101+php at gmail dot com
Description:
------------
PHP VC9 x86 Thread Safe (from http://windows.php.net/download/)

Using a regex to validate if a string is a valid hostname (host or FQDN).

It seems that for certain length strings trying to match a literal period at the end will cause the preg_match to return false if the string does not have a period in it. It also will return false if the string has a period at the end, and the regex does not try to match them.

The regex is using subpatterns ()to apply the zero or more repetition quantifier *. I tried with both capturing and non-capturing (?:), both yield the same result. However, if I use the one or more quantifier + it does not return bool(false). Using {0,} instead of * does not change the outcome.

It seems that the cutoff length for the string is about 20 characters. Less than that, the results are int(0) or int(1) depending on if the regex matches, longer than that, and bool(false) is returned.

If the subpattern is part of a longer string, it does work as anticipated.

Matching a literal period at the beginning of the pattern does not yield an error.

Substituting a-zA-Z0-9 for the [:alnum:] character class does not affect the results.

error_get_last() does not return anything, nothing is showing up in logs with error_reporting(-1) set either.

Test script:
---------------
$regexs = array
(
	'/^[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/',
	'/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*$/',
	'/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)+$/',
	'/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/'
);

$hosts = array
(
	'ABCDEFGHIJ1234567890.', // long string with period at end
	'ABCDEFGHI234567890.', // slightly shorter string with period at end
	'ABCDEFGHIJ1234567890', // long string no period
	'ABCDEFGHI1234567890', // a little shorter
	'ABCDEFGHI123456789', // even shorter
	'ABCDEFGHIJ-1234567890', // long with hyphen
	'ABCDEFGHIJ-123456789', // sorter with hyphen
	'ABCDEFGHI-123456789', // even shorter with hyphen
	'WWW.ABCDEFGHIJ-1234567890.COM', // a FQDN with long sting and hyphen
	'WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM' // a really long FQDN
);

foreach ($regexs as $regex)
{
	echo "\nRegex: $regex\n";

	foreach ($hosts as $host)
	{
		echo "  Host: $host\n";

		$result = preg_match($regex, $host);

		echo '    Result: ';
		if ($result === false)
		{
			echo '(error) ';
			print_r(error_get_last()); // never prints anything?
		}
		else
		{
			echo ($result) ? '(match) ' : '(no match) ';
		}

		var_dump($result);
	}
}

Expected result:
----------------
none of the results should yield bool(false)

Actual result:
--------------
// just the output from the last regex, but others yield bool(false)
Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHI234567890.
    Result: (error) bool(false)
  Host: .ABCDEFGHIJ1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHIJ1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI123456789
    Result: (match) int(1)
  Host: ABCDEFGHIJ-1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHIJ-123456789
    Result: (error) bool(false)
  Host: ABCDEFGHI-123456789
    Result: (match) int(1)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (match) int(1)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM
    Result: (match) int(1)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-02-13 23:40 UTC] mattficken@php.net
Thank you for your report and helping to make php better.

When I ran your script on Windows 2008 and Linux(using TS build of php5.3.10), it looks like the output is the same on both OSes. I don't think this is a PHP on Windows bug.

If you would like, I can reclassify this bug as a general bug, not specific to Windows.

Or, am I missing something? Is this really a PHP on Windows problem?



win2008 sp1 x64 output(TS Build):

Regex: /^[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHI234567890.
    Result: (no match) int(0)
  Host: ABCDEFGHIJ1234567890
    Result: (match) int(1)
  Host: ABCDEFGHI1234567890
    Result: (match) int(1)
  Host: ABCDEFGHI123456789
    Result: (match) int(1)
  Host: ABCDEFGHIJ-1234567890
    Result: (match) int(1)
  Host: ABCDEFGHIJ-123456789
    Result: (match) int(1)
  Host: ABCDEFGHI-123456789
    Result: (match) int(1)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (no match) int(0)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890
-abcd-efgh-hijk.COM
    Result: (no match) int(0)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (match) int(1)
  Host: ABCDEFGHI234567890.
    Result: (match) int(1)
  Host: ABCDEFGHIJ1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI123456789
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHIJ-123456789
    Result: (error) bool(false)
  Host: ABCDEFGHI-123456789
    Result: (no match) int(0)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (error) bool(false)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890
-abcd-efgh-hijk.COM
    Result: (error) bool(false)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)+$/
  Host: ABCDEFGHIJ1234567890.
    Result: (match) int(1)
  Host: ABCDEFGHI234567890.
    Result: (match) int(1)
  Host: ABCDEFGHIJ1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHI1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHI123456789
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-123456789
    Result: (no match) int(0)
  Host: ABCDEFGHI-123456789
    Result: (no match) int(0)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (error) bool(false)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890
-abcd-efgh-hijk.COM
    Result: (error) bool(false)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:
]\-]*[[:alnum:]])*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHI234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHIJ1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI123456789
    Result: (match) int(1)
  Host: ABCDEFGHIJ-1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHIJ-123456789
    Result: (error) bool(false)
  Host: ABCDEFGHI-123456789
    Result: (match) int(1)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (match) int(1)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890
-abcd-efgh-hijk.COM
    Result: (match) int(1)


Linux-x64-gentoo output:
Regex: /^[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHI234567890.
    Result: (no match) int(0)
  Host: ABCDEFGHIJ1234567890
    Result: (match) int(1)
  Host: ABCDEFGHI1234567890
    Result: (match) int(1)
  Host: ABCDEFGHI123456789
    Result: (match) int(1)
  Host: ABCDEFGHIJ-1234567890
    Result: (match) int(1)
  Host: ABCDEFGHIJ-123456789
    Result: (match) int(1)
  Host: ABCDEFGHI-123456789
    Result: (match) int(1)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (no match) int(0)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-123
  45-67890-abcd-efgh-hijk.COM
    Result: (no match) int(0)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (match) int(1)
  Host: ABCDEFGHI234567890.
    Result: (match) int(1)
  Host: ABCDEFGHIJ1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI123456789
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHIJ-123456789
    Result: (error) bool(false)
  Host: ABCDEFGHI-123456789
    Result: (no match) int(0)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (error) bool(false)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-
  12345-67890-abcd-efgh-hijk.COM
    Result: (error) bool(false)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)+$/
  Host: ABCDEFGHIJ1234567890.
    Result: (match) int(1)
  Host: ABCDEFGHI234567890.
    Result: (match) int(1)
  Host: ABCDEFGHIJ1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHI1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHI123456789
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-123456789
    Result: (no match) int(0)
  Host: ABCDEFGHI-123456789
    Result: (no match) int(0)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (error) bool(false)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM
    Result: (error) bool(false)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHI234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHIJ1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI123456789
    Result: (match) int(1)
  Host: ABCDEFGHIJ-1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHIJ-123456789
    Result: (error) bool(false)
  Host: ABCDEFGHI-123456789
    Result: (match) int(1)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (match) int(1)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM
    Result: (match) int(1)
 [2012-02-14 13:42 UTC] dey101+php at gmail dot com
-Operating System: Windows Server 2008 +Operating System:
 [2012-02-14 13:42 UTC] dey101+php at gmail dot com
I did not have access to a linux test platform to test. If you have verified that the bug exists on multiple platforms, please fee free to re-classify as a general bug.
 [2012-02-15 18:39 UTC] mattficken@php.net
I have verified that the output from this repro script is the same on both Windows and Linux (Both using 5.3.10), so this is not a Windows specific bug report anymore.
 [2013-03-14 04:26 UTC] danielklein at airpost dot net
I have simplified the error to the following:
<?php
$string = 'ABCDEFGHIJ12345678.';
var_dump(preg_match('/^(?:\w*)*$/i', $string));
$string = 'ABCDEFGHIJ1234567.';
var_dump(preg_match('/^(?:\w*)*$/i', $string));
?>

Outputs:
boolean false
int 0

Saying /(\w*)*/ is VERY inefficient as it must try every combination before failing, i.e. matching:
'ABCDEFGHIJ12345678', ''
'ABCDEFGHIJ1234567', '8', ''
'ABCDEFGHIJ1234567', '', '8', ''
'ABCDEFGHIJ123456', '78', ''
'ABCDEFGHIJ123456', '7', '8', ''
'ABCDEFGHIJ123456', '7', '', '8', ''
'ABCDEFGHIJ123456', '', '78', ''
...
'', 'A', '', 'B', '', 'C', '', 'D', '', 'E', '', 'F', '', 'G', '', 'H', '', 'I', '', 'J', '', '1', '', '2', '', '3', '', '4', '', '5', '', '6', '', '7', '', '8', ''

It is most likely running out of memory before it completes. I would suggest that this is not a bug as it will use exponentially more memory the longer the input string gets.
You should try something like '/^(?:(?>\w*))*$/i' instead to avoid undesired backtracking.
 [2013-03-14 07:06 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2013-03-14 07:06 UTC] nikic@php.net
@danielklein: Yes, this failure is caused by exceeding the backtracking limit (which can be set via pcre.backtrack_limit). You can see this using pcre_last_error().
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 14:01:29 2024 UTC