|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2007-07-04 19:12 UTC] tony2001@php.net
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Fri Nov 21 06:00:01 2025 UTC |
Description: ------------ I found a similar bug which was closed with status bogus. Unacceptable! There is nothing in the documentation which states limits to the input of preg_replace or any portable work arounds documented. Stating that 'it is just a stack overflow' just to keep the bug count down is more than a little unprofessional. A scripting language should either make the workaround internal or document input limits NOT cause seg faults. This is a bug whether the php community is willing to accept it or not. Reproduce code: --------------- function parse($html, &$title, &$text, &$anchors) { $pstring1 = "'[^']*'"; $pstring2 = '"[^"]*"'; $pnstring = "[^'\">]"; $pintag = "(?:$pstring1|$pstring2|$pnstring)*"; $pattrs = "(?:\\s$pintag){0,1}"; $pcomment = enclose("<!--", "-", "->"); $pscript = enclose("<script$pattrs>", "<", "\\/script>"); $pstyle = enclose("<style$pattrs>", "<", "\\/style>"); $pexclude = "(?:$pcomment|$pscript|$pstyle)"; $ptitle = enclose("<title$pattrs>", "<", "\\/title>"); $panchor = "<a(?:\\s$pintag){0,1}>"; $phref = "href\\s*=[\\s'\"]*([^\\s'\">]*)"; $html = preg_replace("/$pexclude/iX", " ", $html); if ($title !== false) $title = preg_match("/$ptitle/iX", $html, $title) ? $title[1] : ''; if ($text !== false) { $text = preg_replace("/<$pintag>/iX", " ", $html); $text = preg_replace("/\\s+| /iX", " ", $text); } if ($anchors !== false) { preg_match_all("/$panchor/iX", $html, $anchors); $anchors = $anchors[0]; reset($anchors); while (list($i, $x) = each($anchors)) $anchors[$i] = preg_match("/$phref/iX", $x, $x) ? $x[1] : ''; $anchors = array_unique($anchors); } } function enclose($start, $end1, $end2) { return "$start((?:[^$end1]|$end1(?!$end2))*)$end1$end2"; } Expected result: ---------------- The code should clean the html pages into title, text and links. It works fine until large pages are downloaded. Then it seg faults with gdb showing the blame lying on preg_replace.