php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #78454 Multiple consecutive numeric separators in bin/hex numbers cause fatal error
Submitted: 2019-08-24 22:05 UTC Modified: 2019-08-25 08:58 UTC
Votes:1
Avg. Score:2.0 ± 0.0
Reproduced:0 of 1 (0.0%)
From: mattacosta at gmail dot com Assigned:
Status: Closed Package: Scripting Engine problem
PHP Version: 7.4.0beta4 OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: mattacosta at gmail dot com
New email:
PHP Version: OS:

 

 [2019-08-24 22:05 UTC] mattacosta at gmail dot com
Description:
------------
Multiple adjacent underscores after a leading 0 in a hex or bin number cause a fatal error.

https://3v4l.org/jUR51

Test script:
---------------
0x0__F;  // or 0b0__1;

Expected result:
----------------
Parse error: syntax error, unexpected '__F' (T_STRING)

Actual result:
--------------
Fatal error: Out of memory (allocated 2097152) (tried to allocate 18446744073709551615 bytes) in /in/jUR51 on line 3

mmap() failed: [22] Invalid argument

mmap() failed: [22] Invalid argument

Process exited with code 255.

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-08-24 22:55 UTC] requinix@php.net
-Summary: Multiple numeric separators in bin/hex numbers cause fatal error +Summary: Multiple consecutive numeric separators in bin/hex numbers cause fatal error -Status: Open +Status: Analyzed
 [2019-08-24 22:55 UTC] requinix@php.net
> Fatal error: Possible integer overflow in memory allocation (0 + 32)

1. Multiple consecutive underscores are not supposed to be allowed
2. Parser is matching HNUM as only the "0x0"
3. HNUM code skips *all* leading zeros and underscores, goes beyond "0x0" length assumed, len underflows
 [2019-08-25 05:01 UTC] theodorejb at outlook dot com
Is something wrong with the re2c regex? It seems strange that the BNUM/HNUM code even runs when there are invalid characters in the literal. Do we have to duplicate syntax checks in the BNUM/HNUM code?
 [2019-08-25 05:50 UTC] theodorejb at outlook dot com
It seems that multiple consecutive numeric separators aren't the only problem. If I use any invalid character after 0x0_ or 0b0_ I get the following fatal error:

Fatal error: Possible integer overflow in memory allocation (1 * 18446744073709551615 + 1).

Here's my initial attempt at a fix. I'm not sure if it's the right approach or not, but it does produce a nicer error message!


 Zend/zend_language_scanner.l | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/Zend/zend_language_scanner.l b/Zend/zend_language_scanner.l
index 2e21ee7952..901ae1085a 100644
--- a/Zend/zend_language_scanner.l
+++ b/Zend/zend_language_scanner.l
@@ -1776,8 +1776,16 @@ NEWLINE ("\r"|"\n"|"\r\n")
 
 	/* Skip any leading 0s */
 	while (*bin == '0' || *bin == '_') {
+		if (len < 1) {
+			zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0);
+			if (PARSER_MODE()) {
+				RETURN_TOKEN(T_ERROR);
+			}
+			RETURN_TOKEN_WITH_VAL(T_LNUMBER);
+		} else {
+			--len;
+		}
 		++bin;
-		--len;
 	}
 
 	contains_underscores = (memchr(bin, '_', len) != NULL);
@@ -1893,8 +1901,16 @@ NEWLINE ("\r"|"\n"|"\r\n")
 
 	/* Skip any leading 0s */
 	while (*hex == '0' || *hex == '_') {
+		if (len < 1) {
+			zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0);
+			if (PARSER_MODE()) {
+				RETURN_TOKEN(T_ERROR);
+			}
+			RETURN_TOKEN_WITH_VAL(T_LNUMBER);
+		} else {
+			--len;
+		}
 		++hex;
-		--len;
 	}
 
 	contains_underscores = (memchr(hex, '_', len) != NULL);
 [2019-08-25 08:58 UTC] nikic@php.net
> Here's my initial attempt at a fix. I'm not sure if it's the right approach or not, but it does produce a nicer error message!

Checking the length is fine, but it should not throw an error. In this case 0x0 is still a well-formed number and _ is a well-formed label -- the actual error occurs during parsing.

Alternatively the number tokens could be redefined to support arbitrary _ and make cases like __ or trailing _ an explicit error. That will give nicer error messages, but this needs a corresponding change in the regexes to avoid inconsistent behavior.
 [2019-08-25 14:59 UTC] theodorejb at outlook dot com
The following pull request has been associated:

Patch Name: Fix bug #78454
On GitHub:  https://github.com/php/php-src/pull/4618
Patch:      https://github.com/php/php-src/pull/4618.patch
 [2019-08-25 20:47 UTC] cmb@php.net
Automatic comment on behalf of theodorejb@outlook.com
Revision: http://git.php.net/?p=php-src.git;a=commit;h=1a78bdab276a9e34aa1ae00a184538e2d0dacdcd
Log: Fix #78454: Consecutive numeric separators cause OOM error
 [2019-08-25 20:47 UTC] cmb@php.net
-Status: Analyzed +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 11:01:29 2024 UTC