|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #74219 IntlRuleBasedBreakIterator error
Submitted: 2017-03-07 21:56 UTC Modified: 2017-03-08 05:04 UTC
From: fernando at null-life dot com Assigned:
Status: Wont fix Package: intl (PECL)
PHP Version: 7.1.2 OS: Windows
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: fernando at null-life dot com
New email:
PHP Version: OS:


 [2017-03-07 21:56 UTC] fernando at null-life dot com
Test script causes an error when running on Windows 

Test script:

$v0=str_repeat(".", 0xffffff/5);
$newIntlRuleBasedBreakIterator =  new IntlRuleBasedBreakIterator($v0);

Actual result:
Basic Block:
    1450d7dc cmp dword ptr [eax+4],edi
       Tainted Input operands: 'eax','edi'
    1450d7df jle icuuc57!icu_57::rbbitablebuilder::~rbbitablebuilder+0x8d (1450d83d)
       Tainted Input operands: 'SignFlag','ZeroFlag','OverflowFlag'

Exception Hash (Major/Minor): 0x271f1996.0xe72b7f5a

 Hash Usage : Stack Trace:
Major+Minor : icuuc57!icu_57::RBBITableBuilder::~RBBITableBuilder+0x2c
Major+Minor : icuuc57!icu_57::RBBIRuleBuilder::~RBBIRuleBuilder+0x90
Major+Minor : icuuc57!icu_57::RBBIRuleBuilder::createRuleBasedBreakIterator+0x255
Major+Minor : icuuc57!icu_57::RuleBasedBreakIterator::RuleBasedBreakIterator+0x58
Major+Minor : php_intl!get_module+0x3507b
Minor       : php_intl!get_module+0x35282
Minor       : php7ts!execute_ex+0x48
Minor       : php7ts!zend_execute+0x169
Minor       : php7ts!zend_execute_scripts+0x106
Minor       : php7ts!php_execute_script+0x3df
Minor       : php!do_cli+0x452
Minor       : php!main+0x3ac
Minor       : php!__scrt_common_main_seh+0xf9
Minor       : KERNEL32!BaseThreadInitThunk+0x24
Minor       : ntdll_76ee0000!RtlSubscribeWnfStateChangeNotification+0x439
Minor       : ntdll_76ee0000!RtlSubscribeWnfStateChangeNotification+0x404
Instruction Address: 0x000000001450d7dc
Source File: e:\repo\icu4c-57_1-src\source\common\rbbitblb.cpp
Source Line: 49


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2017-03-08 04:53 UTC]
-Status: Open +Status: Verified
 [2017-03-08 04:53 UTC]
This happens on linux as well.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5abaa21 in icu_55::RBBINode::~RBBINode() () from /usr/lib/x86_64-linux-gnu/
(gdb) bt
#0  0x00007ffff5abaa21 in icu_55::RBBINode::~RBBINode() () from /usr/lib/x86_64-linux-gnu/
#1  0x00007ffff5abaa5b in icu_55::RBBINode::~RBBINode() () from /usr/lib/x86_64-linux-gnu/
#2  0x00007ffff5abaa5b in icu_55::RBBINode::~RBBINode() () from /usr/lib/x86_64-linux-gnu/
frames continue recursively calling into RBBINode destructor.
 [2017-03-08 04:56 UTC]
Setting a breakpoint on there, turns out the call is coming from the constructor before we ever get to getBinaryRules()

#0  0x00007ffff5abaa20 in icu_55::RBBINode::~RBBINode() () from /usr/lib/x86_64-linux-gnu/
#1  0x00007ffff5abac0b in icu_55::RBBIRuleBuilder::~RBBIRuleBuilder() () from /usr/lib/x86_64-linux-gnu/
#2  0x00007ffff5abb433 in icu_55::RBBIRuleBuilder::createRuleBasedBreakIterator(icu_55::UnicodeString const&, UParseError*, UErrorCode&) ()
   from /usr/lib/x86_64-linux-gnu/
#3  0x00007ffff5ab79e6 in icu_55::RuleBasedBreakIterator::RuleBasedBreakIterator(icu_55::UnicodeString const&, UParseError&, UErrorCode&) ()
   from /usr/lib/x86_64-linux-gnu/
#4  0x0000000000717cc1 in _php_intlrbbi_constructor_body (execute_data=0x7fffef015100, return_value=0x7fffef015120)
    at /home/sgolemon/dev/php/php-src/ext/intl/breakiterator/rulebasedbreakiterator_methods.cpp:62
#5  0x0000000000717f6b in zim_IntlRuleBasedBreakIterator___construct (execute_data=0x7fffef015100, return_value=0x7fffef015120)
 [2017-03-08 05:04 UTC]
-Status: Verified +Status: Wont fix
 [2017-03-08 05:04 UTC]
Okay, having dug into PHP's use of ICU and ICU's implementation, I'm going to lay the blame for this at ICU's feet.

The rule is bad and it should feel bad. It leads to a stack overflow because ICU doesn't protect against excessive recursion in its rule parser.  Ultimately that's a bug in ICU, not PHP.

Could we detect bad rules and guard against it?  Certainly not trivially, and I doubt we could successfully build something to protect ICU from bad input even with concerted effort.

As unsatisfying as the answer might be, I have to say: Try not to feed rules like this to ICU.  \_(ツ)_/¯
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Mon Nov 30 02:01:23 2020 UTC