php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69430 token_get_all has new irrecoverable errors
Submitted: 2015-04-12 10:28 UTC Modified: 2015-07-09 21:06 UTC
From: lauri dot kentta+php-bugs at gmail dot com Assigned: nikic (profile)
Status: Closed Package: Scripting Engine problem
PHP Version: master-Git-2015-04-12 (Git) OS:
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: lauri dot kentta+php-bugs at gmail dot com
New email:
PHP Version: OS:

 

 [2015-04-12 10:28 UTC] lauri dot kentta+php-bugs at gmail dot com
Description:
------------
Previously, token_get_all would handle even invalid input. This is no longer the case, since some people added unrecoverable errors to the lexer (lex_scan) for unicode escapes and invalid octal literals.

This means that token_get_all can't be used for custom highlighting anymore, because invalid scripts won't produce any output. Interestingly highlight_string is still working, but I couldn't see why.

The commits that probably cause the problem:

commit bae46f307c2d0cdef9b8f5426adcc46920776700
Author: Andrea Faulds <ajf@ajf.me>
Date:   Fri Dec 19 00:40:59 2014 +0000

    Unicode Codepoint Escape Syntax

commit 5f29b980514867f1a09969ca6a1c1f5fb00c3027
Author: Andrea Faulds <ajf@ajf.me>
Date:   Fri Jan 9 06:32:36 2015 +0000

    Error on invalid octal (fixes PHPSadness #31)

commit a8bf1c5d8f5755b53492e58040cfe88150eb57b6
Author: Nikita Popov <nikic@php.net>
Date:   Sat Mar 21 20:10:19 2015 +0100

    Throw ParseException from lexer


Test script:
---------------
<?php
var_dump(token_get_all('<?php 09 "\u{bar}"'));

Expected result:
----------------
Some meaningful tokens such as:
T_OPEN_TAG <?php
T_LNUMBER 09
T_CONSTANT_ENCAPSED_STRING "\u{bar}"

or
T_OPEN_TAG <?php
T_LNUMBER_INVALID 09
T_CONSTANT_ENCAPSED_STRING_INVALID "\u{bar}"

or even
T_OPEN_TAG <?php
T_ERROR 09
T_ERROR "\u{bar}"


Actual result:
--------------
ParseException: Invalid numeric literal
ParseException: Invalid UTF-8 codepoint escape sequence

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-04-12 10:36 UTC] nikic@php.net
-Assigned To: +Assigned To: nikic
 [2015-04-12 10:36 UTC] nikic@php.net
Good point. I changed these fatals to ParseExceptions to make it possible to handle them gracefully at all, but you're right that throwing from token_get_all() will be a problem as well for some usages.

I'd probably go with returning E_ERROR tokens in this case, as we already do it internally. It would be good to still be able to access the exceptions that are thrown in some way.
 [2015-07-09 17:12 UTC] nikic@php.net
Automatic comment on behalf of nikic
Revision: http://git.php.net/?p=php-src.git;a=commit;h=d91aad5966f01259f0e1a431a754d917807761b5
Log: Fix bug #69430
 [2015-07-09 17:12 UTC] nikic@php.net
-Status: Assigned +Status: Closed
 [2015-07-09 21:06 UTC] nikic@php.net
Followup commit https://github.com/php/php-src/commit/a49ce7bb91bec02d6f26b3118404371df23242fe drops the T_ERROR token from the output again and switches back to providing the exact same result as PHP 5.

The TOKEN_PARSE mode can be used if you want to have an exception.
 [2015-07-21 14:21 UTC] ab@php.net
Automatic comment on behalf of nikic
Revision: http://git.php.net/?p=php-src.git;a=commit;h=d91aad5966f01259f0e1a431a754d917807761b5
Log: Fix bug #69430
 [2016-07-20 11:37 UTC] davey@php.net
Automatic comment on behalf of nikic
Revision: http://git.php.net/?p=php-src.git;a=commit;h=d91aad5966f01259f0e1a431a754d917807761b5
Log: Fix bug #69430
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Oct 24 09:00:01 2025 UTC