php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #78454 Multiple consecutive numeric separators in bin/hex numbers cause fatal error
Submitted: 2019-08-24 22:05 UTC Modified: 2019-08-25 08:58 UTC
Votes:1
Avg. Score:2.0 ± 0.0
Reproduced:0 of 1 (0.0%)
From: mattacosta at gmail dot com Assigned:
Status: Closed Package: Scripting Engine problem
PHP Version: 7.4.0beta4 OS:
Private report: No CVE-ID: None
 [2019-08-24 22:05 UTC] mattacosta at gmail dot com
Description:
------------
Multiple adjacent underscores after a leading 0 in a hex or bin number cause a fatal error.

https://3v4l.org/jUR51

Test script:
---------------
0x0__F;  // or 0b0__1;

Expected result:
----------------
Parse error: syntax error, unexpected '__F' (T_STRING)

Actual result:
--------------
Fatal error: Out of memory (allocated 2097152) (tried to allocate 18446744073709551615 bytes) in /in/jUR51 on line 3

mmap() failed: [22] Invalid argument

mmap() failed: [22] Invalid argument

Process exited with code 255.

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-08-24 22:55 UTC] requinix@php.net
-Summary: Multiple numeric separators in bin/hex numbers cause fatal error +Summary: Multiple consecutive numeric separators in bin/hex numbers cause fatal error -Status: Open +Status: Analyzed
 [2019-08-24 22:55 UTC] requinix@php.net
> Fatal error: Possible integer overflow in memory allocation (0 + 32)

1. Multiple consecutive underscores are not supposed to be allowed
2. Parser is matching HNUM as only the "0x0"
3. HNUM code skips *all* leading zeros and underscores, goes beyond "0x0" length assumed, len underflows
 [2019-08-25 05:01 UTC] theodorejb at outlook dot com
Is something wrong with the re2c regex? It seems strange that the BNUM/HNUM code even runs when there are invalid characters in the literal. Do we have to duplicate syntax checks in the BNUM/HNUM code?
 [2019-08-25 05:50 UTC] theodorejb at outlook dot com
It seems that multiple consecutive numeric separators aren't the only problem. If I use any invalid character after 0x0_ or 0b0_ I get the following fatal error:

Fatal error: Possible integer overflow in memory allocation (1 * 18446744073709551615 + 1).

Here's my initial attempt at a fix. I'm not sure if it's the right approach or not, but it does produce a nicer error message!


 Zend/zend_language_scanner.l | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/Zend/zend_language_scanner.l b/Zend/zend_language_scanner.l
index 2e21ee7952..901ae1085a 100644
--- a/Zend/zend_language_scanner.l
+++ b/Zend/zend_language_scanner.l
@@ -1776,8 +1776,16 @@ NEWLINE ("\r"|"\n"|"\r\n")
 
 	/* Skip any leading 0s */
 	while (*bin == '0' || *bin == '_') {
+		if (len < 1) {
+			zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0);
+			if (PARSER_MODE()) {
+				RETURN_TOKEN(T_ERROR);
+			}
+			RETURN_TOKEN_WITH_VAL(T_LNUMBER);
+		} else {
+			--len;
+		}
 		++bin;
-		--len;
 	}
 
 	contains_underscores = (memchr(bin, '_', len) != NULL);
@@ -1893,8 +1901,16 @@ NEWLINE ("\r"|"\n"|"\r\n")
 
 	/* Skip any leading 0s */
 	while (*hex == '0' || *hex == '_') {
+		if (len < 1) {
+			zend_throw_exception(zend_ce_parse_error, "Invalid numeric literal", 0);
+			if (PARSER_MODE()) {
+				RETURN_TOKEN(T_ERROR);
+			}
+			RETURN_TOKEN_WITH_VAL(T_LNUMBER);
+		} else {
+			--len;
+		}
 		++hex;
-		--len;
 	}
 
 	contains_underscores = (memchr(hex, '_', len) != NULL);
 [2019-08-25 08:58 UTC] nikic@php.net
> Here's my initial attempt at a fix. I'm not sure if it's the right approach or not, but it does produce a nicer error message!

Checking the length is fine, but it should not throw an error. In this case 0x0 is still a well-formed number and _ is a well-formed label -- the actual error occurs during parsing.

Alternatively the number tokens could be redefined to support arbitrary _ and make cases like __ or trailing _ an explicit error. That will give nicer error messages, but this needs a corresponding change in the regexes to avoid inconsistent behavior.
 [2019-08-25 14:59 UTC] theodorejb at outlook dot com
The following pull request has been associated:

Patch Name: Fix bug #78454
On GitHub:  https://github.com/php/php-src/pull/4618
Patch:      https://github.com/php/php-src/pull/4618.patch
 [2019-08-25 20:47 UTC] cmb@php.net
Automatic comment on behalf of theodorejb@outlook.com
Revision: http://git.php.net/?p=php-src.git;a=commit;h=1a78bdab276a9e34aa1ae00a184538e2d0dacdcd
Log: Fix #78454: Consecutive numeric separators cause OOM error
 [2019-08-25 20:47 UTC] cmb@php.net
-Status: Analyzed +Status: Closed
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 22 19:01:31 2025 UTC