php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #26469 Reproducible crash on regexp
Submitted: 2003-11-29 20:13 UTC Modified: 2003-11-30 04:12 UTC
From: f23wop602 at sneakemail dot com Assigned:
Status: Not a bug Package: Reproducible crash
PHP Version: 4.3.4 OS: RH7 2.2.16-22
Private report: No CVE-ID: None
 [2003-11-29 20:13 UTC] f23wop602 at sneakemail dot com
Description:
------------
We had some crashes, and after some tracking we found a regexp which crashes on specific data. The data is contained in the script in the URL below, I cut it down as much as I could while still triggering the crash.

Reproduce code:
---------------
http://test.wikipedia.org/crash-php4.3.4.txt

Most of it is just data, the crash occurs on the regexp on the data at the end.

Actual result:
--------------
(gdb) bt
#0  0x808225c in match (
    eptr=0x81b4ec0 "kuterat: Enligt ''Miller'' (53) kommer en Psi-liknande form (&Psi;) av kappa fr?n Proto-Kanaaneiska. Kappa stod troligtvis f?r /k/ s?v?l som /k_h/ i tidig grekisk ortografi och senare ?terinf?rdes den"..., ecode=0x81b5e56 "8????\177???????", '?' <repeats 20 times>, "=", offset_top=4, md=0xbfffca50, ims=0,
    eptrb=0xbf800198, flags=2) at /tmp/php-4.3.4/ext/pcre/pcrelib/pcre.c:4986
#1  0x808236e in match (
    eptr=0x81b4ec0 "kuterat: Enligt ''Miller'' (53) kommer en Psi-liknande form (&Psi;) av kappa fr?n Proto-Kanaaneiska. Kappa stod troligtvis f?r /k/ s?v?l som /k_h/ i tidig grekisk ortografi och senare ?terinf?rdes den"..., ecode=0x81b5e53 "M", offset_top=4, md=0xbfffca50, ims=0, eptrb=0xbf800198, flags=2)
    at /tmp/php-4.3.4/ext/pcre/pcrelib/pcre.c:5059
#2  0x8082eb0 in match (
    eptr=0x81b4ec0 "kuterat: Enligt ''Miller'' (53) kommer en Psi-liknande form (&Psi;) av kappa fr?n Proto-Kanaaneiska. Kappa stod troligtvis f?r /k/ s?v?l som /k_h/ i tidig grekisk ortografi och senare ?terinf?rdes den"..., ecode=0x81b5e7e "?", offset_top=4, md=0xbfffca50, ims=0, eptrb=0xbf800678, flags=2)
    at /tmp/php-4.3.4/ext/pcre/pcrelib/pcre.c:5583

[snip]

#11143 0x808236e in match (
    eptr=0x81b38fd ">\r\n\t<td>[[Rho]]</td>\r\n\t<td>[rO:]</td>\r\n\t<td>[ro]</td>\r\n\t<td>&nbsp;</td>\r\n\t<td>[r]</td>\r\n\t<td>[r]</td>\r\n\t<td>100</td>\r\n\t<td>&#x5e8; Resh</td>\r\n\t<td>&amp;rho;</td></tr>\r\n<tr><td>&Sigma; &sigma;</td>\r\n\t<"..., ecode=0x81b5e53 "M", offset_top=4, md=0xbfffca50, ims=0,
    eptrb=0xbfea1838, flags=2) at /tmp/php-4.3.4/ext/pcre/pcrelib/pcre.c:5059
#11144 0x8082eb0 in match (
    eptr=0x81b38fd ">\r\n\t<td>[[Rho]]</td>\r\n\t<td>[rO:]</td>\r\n\t<td>[ro]</td>\r\n\t<td>&nbsp;</td>\r\n\t<td>[r]</td>\r\n\t<td>[r]</td>\r\n\t<td>100</td>\r\n\t<td>&#x5e8; Resh</td>\r\n\t<td>&amp;rho;</td></tr>\r\n<tr><td>&Sigma; &sigma;</td>\r\n\t<"..., ecode=0x81b5e7e "?", offset_top=4, md=0xbfffca50, ims=0,
    eptrb=0xbfea1d18, flags=2) at /tmp/php-4.3.4/ext/pcre/pcrelib/pcre.c:5583
#11145 0x808236e in match (
    eptr=0x81b38fc "d>\r\n\t<td>[[Rho]]</td>\r\n\t<td>[rO:]</td>\r\n\t<td>[ro]</td>\r\n\t<td>&nbsp;</td>\r\n\t<td>[r]</td>\r\n\t<td>[r]</td>\r\n\t<td>100</td>\r\n\t<td>&#x5e8; Resh</td>\r\n\t<td>&amp;rho;</td></tr>\r\n<tr><td>&Sigma; &sigma;</td>\r\n\t"..., ecode=0x81b5e53 "M", offset_top=4, md=0xbfffca50, ims=0,
    eptrb=0xbfea1d18, flags=2) at /tmp/php-4.3.4/ext/pcre/pcrelib/pcre.c:5059
#11146 0x8082eb0 in match (
    eptr=0x81b38fc "d>\r\n\t<td>[[Rho]]</td>\r\n\t<td>[rO:]</td>\r\n\t<td>[ro]</td>\r\n\t<td>&nbsp;</td>\r\n\t<td>[r]</td>\r\n\t<td>[r]</td>\r\n\t<td>100</td>\r\n\t<td>&#x5e8; Resh</td>\r\n\t<td>&amp;rho;</td></tr>\r\n<tr><td>&Sigma; &sigma;</td>\r\n\t"..., ecode=0x81b5e7e "?", offset_top=4, md=0xbfffca50, ims=0,
    eptrb=0xbfea21f8, flags=2) at /tmp/php-4.3.4/ext/pcre/pcrelib/pcre.c:5583
#11147 0x808236e in match (
    eptr=0x81b38fb "td>\r\n\t<td>[[Rho]]</td>\r\n\t<td>[rO:]</td>\r\n\t<td>[ro]</td>\r\n\t<td>&nbsp;</td>\r\n\t<td>[r]</td>\r\n\t<td>[r]</td>\r\n\t<td>100</td>\r\n\t<td>&#x5e8; Resh</td>\r\n\t<td>&amp;rho;</td></tr>\r\n<tr><td>&Sigma; &sigma;</td>\r\n"..., ecode=0x81b5e53 "M", offset_top=4, md=0xbfffca50, ims=0, eptrb=0xbfea21f8,
    flags=2) at /tmp/php-4.3.4/ext/pcre/pcrelib/pcre.c:5059
#11148 0x8082eb0 in match (
    eptr=0x81b38fb "td>\r\n\t<td>[[Rho]]</td>\r\n\t<td>[rO:]</td>\r\n\t<td>[ro]</td>\r\n\t<td>&nbsp;</td>\r\n\t<td>[r]</td>\r\n\t<td>[r]</td>\r\n\t<td>100</td>\r\n\t<td>&#x5e8; Resh</td>\r\n\t<td>&amp;rho;</td></tr>\r\n<tr><td>&Sigma; &sigma;</td>\r\n"..., ecode=0x81b5e7e "?", offset_top=4, md=0xbfffca50, ims=0, eptrb=0xbfea26d8,

And so on... You get the idea

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-11-30 04:12 UTC] sniper@php.net
Not PHP bug. You just run into the documented PCRE limitations.. check http://www.pcre.org/pcre.txt for topic "LIMITATIONS".



 [2003-11-30 07:14 UTC] brion at pobox dot com
The given script works for me without crashing on 4.3.2 (Linux/x86, Red Hat 7.3 and Red Hat 9.0, and MacOS X 10.3.1). The original poster has told me that it works for him in 4.3.4 beta and crashes only in 4.3.4 release.

Anyway, it's okay for the regexp to _fail_ if you're pushing the limits, but returning an error code would be nicer that smashing the stack and segfaulting. In web applications with possibly untrusted data this can be a security risk, and that's PHP's primary use.

Might there be a way to get PCRE to handle such conditions gracefully?
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jul 16 20:01:32 2025 UTC