php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46947 pattern matching is not working as expected
Submitted: 2008-12-26 19:16 UTC Modified: 2008-12-27 22:31 UTC
From: victor at casnt dot ro Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.8 OS: Debian
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: victor at casnt dot ro
New email:
PHP Version: OS:

 

 [2008-12-26 19:16 UTC] victor at casnt dot ro
Description:
------------
The exact same pattern matching is not working on the same string, with a extra line at the end. 
The length of the string seems to be the problem.

Reproduce code:
---------------
The script:
$handle = fopen('test3.xml', "r");
$contents = fread($handle, filesize('test3.xml'));
fclose($handle);
$contents = preg_replace('/\>[\t|\ |\s]{1,}\</', "><", $contents);
$contents = preg_replace('/\n/', "", $contents);
if(preg_match('/(\<\?xml.{1,}\<report[^>]{1,}\>)/', $contents, $match)) {
    print "Match: $match[1]\n";
} else {print "Fail\n";}

OBSERVATION: If you delete the last line from the file test3.xml, the pattern matching will work fine. The last lines have nothing to do with the pattern.

You can find the file here: 
http://www.casnt.ro/Files/XML/test3.xml

Expected result:
----------------
Match: <?xml version="1.0" encoding="utf-8"?><report AppKey="CNAS-v1.0.2801.665" AppID="14" clinic="SSSSSSSSSSSSSSSSSSSSSS" fiscalCode="11111111" contractNo="123123123/2008" insuranceHouse="CAS-NT" reportingDate="2008-12-07" startFrom="2008-11-01" endTo="2008-11-30" invoiceNo="" labValue="0" hspValue="0" xmlns="http://localhost">


Actual result:
--------------
Fail

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-12-26 19:54 UTC] felipe@php.net
Hello, the dot character only match a newline in a string when using the 's' modifier.

Try:
/(<\?xml.+<report[^>]+>)/s

PS: You don't need escape < > in this case (that it's not a separator (the / in this case))
 [2008-12-26 20:02 UTC] felipe@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

.
 [2008-12-27 16:08 UTC] victor at casnt dot ro
I have read the docs and I see that preg_match does not work with big strings. 
In this case I would like transform this ticket from a bug report to a request for new functionalities.

It will also be good for the competition(perl works with big strings).

I need to parse a big xml file. 
Using the php xml functions in my case is a complication.
Pattern matching can solve my problem in just 3 lines.

When I say "big string" I mean 100KB(my test failed on a 104KB file).
Pattern matching will clearly be inneficient on huge stings compared to a dedicated parser.

Thank you.
 [2008-12-27 22:31 UTC] scottmac@php.net
If you write bad regular expressions you can consume all the stack space and crash the process.

Therefore there are recursion and backtrack limit.

http://www.php.net/manual/en/pcre.configuration.php
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Thu May 19 18:03:41 2022 UTC