php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46947 pattern matching is not working as expected
Submitted: 2008-12-26 19:16 UTC Modified: 2008-12-27 22:31 UTC
From: victor at casnt dot ro Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.8 OS: Debian
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: victor at casnt dot ro
New email:
PHP Version: OS:

 

 [2008-12-26 19:16 UTC] victor at casnt dot ro
Description:
------------
The exact same pattern matching is not working on the same string, with a extra line at the end. 
The length of the string seems to be the problem.

Reproduce code:
---------------
The script:
$handle = fopen('test3.xml', "r");
$contents = fread($handle, filesize('test3.xml'));
fclose($handle);
$contents = preg_replace('/\>[\t|\ |\s]{1,}\</', "><", $contents);
$contents = preg_replace('/\n/', "", $contents);
if(preg_match('/(\<\?xml.{1,}\<report[^>]{1,}\>)/', $contents, $match)) {
    print "Match: $match[1]\n";
} else {print "Fail\n";}

OBSERVATION: If you delete the last line from the file test3.xml, the pattern matching will work fine. The last lines have nothing to do with the pattern.

You can find the file here: 
http://www.casnt.ro/Files/XML/test3.xml

Expected result:
----------------
Match: <?xml version="1.0" encoding="utf-8"?><report AppKey="CNAS-v1.0.2801.665" AppID="14" clinic="SSSSSSSSSSSSSSSSSSSSSS" fiscalCode="11111111" contractNo="123123123/2008" insuranceHouse="CAS-NT" reportingDate="2008-12-07" startFrom="2008-11-01" endTo="2008-11-30" invoiceNo="" labValue="0" hspValue="0" xmlns="http://localhost">


Actual result:
--------------
Fail

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-12-26 19:54 UTC] felipe@php.net
Hello, the dot character only match a newline in a string when using the 's' modifier.

Try:
/(<\?xml.+<report[^>]+>)/s

PS: You don't need escape < > in this case (that it's not a separator (the / in this case))
 [2008-12-26 20:02 UTC] felipe@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

.
 [2008-12-27 16:08 UTC] victor at casnt dot ro
I have read the docs and I see that preg_match does not work with big strings. 
In this case I would like transform this ticket from a bug report to a request for new functionalities.

It will also be good for the competition(perl works with big strings).

I need to parse a big xml file. 
Using the php xml functions in my case is a complication.
Pattern matching can solve my problem in just 3 lines.

When I say "big string" I mean 100KB(my test failed on a 104KB file).
Pattern matching will clearly be inneficient on huge stings compared to a dedicated parser.

Thank you.
 [2008-12-27 22:31 UTC] scottmac@php.net
If you write bad regular expressions you can consume all the stack space and crash the process.

Therefore there are recursion and backtrack limit.

http://www.php.net/manual/en/pcre.configuration.php
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 23:01:29 2024 UTC