php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60423 Segmentation fault with the UTF-8 check regexp in some cases
Submitted: 2011-12-01 09:04 UTC Modified: 2011-12-02 13:18 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: amal dot samally at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.3.8 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: amal dot samally at gmail dot com
New email:
PHP Version: OS:

 

 [2011-12-01 09:04 UTC] amal dot samally at gmail dot com
Description:
------------
I'm using the regexp to test whether a string is a valid UTF-8 encoded string.
But in some cases it causes a segmentation fault.

Examples of strings that cause the error:
http://samally.ru/php_pcre_segmentation_fault/test1.txt
http://samally.ru/php_pcre_segmentation_fault/test2.txt

Test script:
---------------
$string = file_get_contents('http://samally.ru/php_pcre_segmentation_fault/test1.txt');
// $string = file_get_contents('http://samally.ru/php_pcre_segmentation_fault/test2.txt');

// Tests whether a string is a valid UTF-8 encoded string.
// @link http://w3.org/International/questions/qa-forms-utf-8.html
$r = preg_match('~^(?:
   [\x09\x0A\x0D\x20-\x7E]            # ASCII without control characters
 | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
 | \xE0[\xA0-\xBF][\x80-\xBF]         # excluding overlongs
 | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
 | \xED[\x80-\x9F][\x80-\xBF]         # excluding surrogates
 | \xF0[\x90-\xBF][\x80-\xBF]{2}      # planes 1-3
 | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
 | \xF4[\x80-\x8F][\x80-\xBF]{2}      # plane 16
)*$~DSXx', $string);


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-12-01 10:10 UTC] laruence@php.net
-Status: Open +Status: Feedback
 [2011-12-01 10:10 UTC] laruence@php.net
see #41638, may be the same.
 [2011-12-01 10:57 UTC] amal dot samally at gmail dot com
-Status: Feedback +Status: Open
 [2011-12-01 10:57 UTC] amal dot samally at gmail dot com
I think not.
Also changing pcre.backtrack_limit / pcre.recursion_limit do not give anything.
 [2011-12-01 14:46 UTC] laruence@php.net
-Status: Open +Status: Feedback
 [2011-12-01 14:46 UTC] laruence@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a backtrace to see what is happening behind the scenes. To
find out how to generate a backtrace, please read
http://bugs.php.net/bugs-generating-backtrace.php for *NIX and
http://bugs.php.net/bugs-generating-backtrace-win32.php for Win32

Once you have generated a backtrace, please submit it to this bug
report and change the status back to "Open". Thank you for helping
us make PHP better.


 [2011-12-02 12:29 UTC] amal dot samally at gmail dot com
gdb output:

(gdb) run test.php
Starting program: /usr/local/bin/php test.php
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0x0000000000498948 in match (
    eptr=0x139a566 "1{font-weight:bold}#gbg6.gbgt-
hvr,#gbg6.gbgt:focus{background-color:transparent;background-
image:none}.gbg4a{font-size:0;line-height:0}.gbg4a .gbts{padding:27px 5px 
0;*padding:25px 5px 0}.gbto .gbg4a "..., ecode=0x13d9525 "^", 
    mstart=0x13990f0 "<!doctype html> <head>      <title>docs.pravo.ru - Поиск в 
Google</title>   <script>window.google={kEI:\"hTnXTp-
POZDqOabAzMYO\",getEI:function(a){var b;while(a&&!
(a.getAttribute&&(b=a.getAttribute"..., markptr=0x0, offset_top=2, 
md=0x7fffffffb340, ims=0, eptrb=Cannot access memory at address 0x7fffff7feff8
)
    at /tmp/php_build/php-5.3.8/ext/pcre/pcrelib/pcre_exec.c:471
471	{
(gdb) bt
#0  0x0000000000498948 in match (
    eptr=0x139a566 "1{font-weight:bold}#gbg6.gbgt-
hvr,#gbg6.gbgt:focus{background-color:transparent;background-
image:none}.gbg4a{font-size:0;line-height:0}.gbg4a .gbts{padding:27px 5px 
0;*padding:25px 5px 0}.gbto .gbg4a "..., ecode=0x13d9525 "^", 
    mstart=0x13990f0 "<!doctype html> <head>      <title>docs.pravo.ru - Поиск в 
Google</title>   <script>window.google={kEI:\"hTnXTp-
POZDqOabAzMYO\",getEI:function(a){var b;while(a&&!
(a.getAttribute&&(b=a.getAttribute"..., markptr=0x0, offset_top=2, 
md=0x7fffffffb340, ims=0, eptrb=Cannot access memory at address 0x7fffff7feff8
)
    at /tmp/php_build/php-5.3.8/ext/pcre/pcrelib/pcre_exec.c:471
#1  0x000000000049b352 in match (
    eptr=0x139a566 "1{font-weight:bold}#gbg6.gbgt-
hvr,#gbg6.gbgt:focus{background-color:transparent;background-
image:none}.gbg4a{font-size:0;line-height:0}.gbg4a .gbts{padding:27px 5px 
0;*padding:25px 5px 0}.gbto .gbg4a "..., ecode=0x13d9748 "V\002#\033U\002,", 
    mstart=0x13990f0 "<!doctype html> <head>      <title>docs.pravo.ru - Поиск в 
Google</title>   <script>window.google={kEI:\"hTnXTp-
POZDqOabAzMYO\",getEI:function(a){var b;while(a&&!
(a.getAttribute&&(b=a.getAttribute"..., markptr=0x0, offset_top=2, 
md=0x7fffffffb340, ims=0, eptrb=0x0, flags=0, rdepth=10464)
    at /tmp/php_build/php-5.3.8/ext/pcre/pcrelib/pcre_exec.c:1654
#2  0x00000000004994e0 in match (
    eptr=0x139a565 "s1{font-weight:bold}#gbg6.gbgt-
hvr,#gbg6.gbgt:focus{background-color:transparent;background-
image:none}.gbg4a{font-size:0;line-height:0}.gbg4a .gbts{padding:27px 5px 
0;*padding:25px 5px 0}.gbto .gbg4a"..., ecode=0x13d9525 "^", 
    mstart=0x13990f0 "<!doctype html> <head>      <title>docs.pravo.ru - Поиск в 
Google</title>   <script>window.google={kEI:\"hTnXTp-
POZDqOabAzMYO\",getEI:function(a){var b;while(a&&!
(a.getAttribute&&(b=a.getAttribute"..., markptr=0x0, offset_top=2, 
md=0x7fffffffb340, ims=0, eptrb=0x0, flags=0, rdepth=10463)
    at /tmp/php_build/php-5.3.8/ext/pcre/pcrelib/pcre_exec.c:885
#3  0x000000000049b352 in match (
    eptr=0x139a565 "s1{font-weight:bold}#gbg6.gbgt-
hvr,#gbg6.gbgt:focus{background-color:transparent;background-
image:none}.gbg4a{font-size:0;line-height:0}.gbg4a .gbts{padding:27px 5px 
0;*padding:25px 5px 0}.gbto .gbg4a"..., ecode=0x13d9748 "V\002#\033U\002,", 
    mstart=0x13990f0 "<!doctype html> <head>      <title>docs.pravo.ru - Поиск в 
Google</title>   <s---Type <return> to continue, or q <return> to quit---
 [2011-12-02 12:29 UTC] amal dot samally at gmail dot com
-Status: Feedback +Status: Open
 [2011-12-02 13:18 UTC] felipe@php.net
-Status: Open +Status: Bogus
 [2011-12-02 13:18 UTC] felipe@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

It's a known behavior from PCRE. Check out other bugs reports or the PCRE documentation.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 02:01:29 2024 UTC