php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #66649 token_get_all() does not tag new bracket array syntax as T_ARRAY
Submitted: 2014-02-05 20:09 UTC Modified: 2014-02-05 20:31 UTC
From: bishop@php.net Assigned:
Status: Not a bug Package: Scripting Engine problem
PHP Version: 5.4.24 OS: n/a
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: bishop@php.net
New email:
PHP Version: OS:

 

 [2014-02-05 20:09 UTC] bishop@php.net
Description:
------------
token_get_all() identifies the start of an array using array () syntax with T_ARRAY.

I expected token_get_all() to do the same for arrays using the new bracket [] syntax.

Instead, token_get_all() ignores bracket style array notation altogether.  This means that anyone relying on T_ARRAY parsing will not be able to see the new array syntax notation.

Test script:
---------------
<?php
array_map(
    function ($string) {
        array_map(
            function ($token) {
                if (is_array($token)) {
                    echo token_name($token[0]) . ', ';
                }
            },
            token_get_all($string)
        );
        echo PHP_EOL;
    },
    array ('<?php $a = array (1);', '<?php $a = [1]')
);

Expected result:
----------------
T_OPEN_TAG, T_VARIABLE, T_WHITESPACE, T_WHITESPACE, T_ARRAY, T_WHITESPACE, T_LNUMBER, 
T_OPEN_TAG, T_VARIABLE, T_WHITESPACE, T_WHITESPACE, T_ARRAY, T_WHITESPACE, T_LNUMBER, 

Actual result:
--------------
T_OPEN_TAG, T_VARIABLE, T_WHITESPACE, T_WHITESPACE, T_ARRAY, T_WHITESPACE, T_LNUMBER, 
T_OPEN_TAG, T_VARIABLE, T_WHITESPACE, T_WHITESPACE, T_LNUMBER, 

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-02-05 20:31 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2014-02-05 20:31 UTC] nikic@php.net
T_ARRAY represents the keyword "array", not the semantic concept of an array. The '[' will be provided like any other single-character token.
 [2014-02-05 20:51 UTC] bishop@php.net
Uhm, that's my point. I expect T_ARRAY no longer means "array" literal, but the semantic concept of "array syntax".  The documentation even references T_ARRAY to "array syntax".

If "[" is just a character literal, indistinguishable in its use of array syntax, then how does one translate "[1, 2, 3]" back to "array (1, 2, 3)" like in this post: http://stackoverflow.com/questions/21500628/porting-php-5-4-to-5-3
 [2014-02-05 20:58 UTC] nikic@php.net
token_get_all exposes PHP's tokenizer/lexer, which is a component those only job is to convert a character stream into a token stream. It does little more than slap a label on certain character sequences. It does not have the ability to distinguish the '[' of an array declaration and the '[' of an array access.

If you need this kind of semantic information you should use a PHP parser.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Mon Jul 07 12:01:35 2025 UTC