php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69430 token_get_all has new irrecoverable errors
Submitted: 2015-04-12 10:28 UTC Modified: 2015-07-09 21:06 UTC
From: lauri dot kentta+php-bugs at gmail dot com Assigned: nikic (profile)
Status: Closed Package: Scripting Engine problem
PHP Version: master-Git-2015-04-12 (Git) OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: lauri dot kentta+php-bugs at gmail dot com
New email:
PHP Version: OS:

 

 [2015-04-12 10:28 UTC] lauri dot kentta+php-bugs at gmail dot com
Description:
------------
Previously, token_get_all would handle even invalid input. This is no longer the case, since some people added unrecoverable errors to the lexer (lex_scan) for unicode escapes and invalid octal literals.

This means that token_get_all can't be used for custom highlighting anymore, because invalid scripts won't produce any output. Interestingly highlight_string is still working, but I couldn't see why.

The commits that probably cause the problem:

commit bae46f307c2d0cdef9b8f5426adcc46920776700
Author: Andrea Faulds <ajf@ajf.me>
Date:   Fri Dec 19 00:40:59 2014 +0000

    Unicode Codepoint Escape Syntax

commit 5f29b980514867f1a09969ca6a1c1f5fb00c3027
Author: Andrea Faulds <ajf@ajf.me>
Date:   Fri Jan 9 06:32:36 2015 +0000

    Error on invalid octal (fixes PHPSadness #31)

commit a8bf1c5d8f5755b53492e58040cfe88150eb57b6
Author: Nikita Popov <nikic@php.net>
Date:   Sat Mar 21 20:10:19 2015 +0100

    Throw ParseException from lexer


Test script:
---------------
<?php
var_dump(token_get_all('<?php 09 "\u{bar}"'));

Expected result:
----------------
Some meaningful tokens such as:
T_OPEN_TAG <?php
T_LNUMBER 09
T_CONSTANT_ENCAPSED_STRING "\u{bar}"

or
T_OPEN_TAG <?php
T_LNUMBER_INVALID 09
T_CONSTANT_ENCAPSED_STRING_INVALID "\u{bar}"

or even
T_OPEN_TAG <?php
T_ERROR 09
T_ERROR "\u{bar}"


Actual result:
--------------
ParseException: Invalid numeric literal
ParseException: Invalid UTF-8 codepoint escape sequence

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-04-12 10:36 UTC] nikic@php.net
-Assigned To: +Assigned To: nikic
 [2015-04-12 10:36 UTC] nikic@php.net
Good point. I changed these fatals to ParseExceptions to make it possible to handle them gracefully at all, but you're right that throwing from token_get_all() will be a problem as well for some usages.

I'd probably go with returning E_ERROR tokens in this case, as we already do it internally. It would be good to still be able to access the exceptions that are thrown in some way.
 [2015-07-09 17:12 UTC] nikic@php.net
Automatic comment on behalf of nikic
Revision: http://git.php.net/?p=php-src.git;a=commit;h=d91aad5966f01259f0e1a431a754d917807761b5
Log: Fix bug #69430
 [2015-07-09 17:12 UTC] nikic@php.net
-Status: Assigned +Status: Closed
 [2015-07-09 21:06 UTC] nikic@php.net
Followup commit https://github.com/php/php-src/commit/a49ce7bb91bec02d6f26b3118404371df23242fe drops the T_ERROR token from the output again and switches back to providing the exact same result as PHP 5.

The TOKEN_PARSE mode can be used if you want to have an exception.
 [2015-07-21 14:21 UTC] ab@php.net
Automatic comment on behalf of nikic
Revision: http://git.php.net/?p=php-src.git;a=commit;h=d91aad5966f01259f0e1a431a754d917807761b5
Log: Fix bug #69430
 [2016-07-20 11:37 UTC] davey@php.net
Automatic comment on behalf of nikic
Revision: http://git.php.net/?p=php-src.git;a=commit;h=d91aad5966f01259f0e1a431a754d917807761b5
Log: Fix bug #69430
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Jan 31 06:01:31 2025 UTC