php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #66649 token_get_all() does not tag new bracket array syntax as T_ARRAY
Submitted: 2014-02-05 20:09 UTC Modified: 2014-02-05 20:31 UTC
From: bishop@php.net Assigned:
Status: Not a bug Package: Scripting Engine problem
PHP Version: 5.4.24 OS: n/a
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: bishop@php.net
New email:
PHP Version: OS:

 

 [2014-02-05 20:09 UTC] bishop@php.net
Description:
------------
token_get_all() identifies the start of an array using array () syntax with T_ARRAY.

I expected token_get_all() to do the same for arrays using the new bracket [] syntax.

Instead, token_get_all() ignores bracket style array notation altogether.  This means that anyone relying on T_ARRAY parsing will not be able to see the new array syntax notation.

Test script:
---------------
<?php
array_map(
    function ($string) {
        array_map(
            function ($token) {
                if (is_array($token)) {
                    echo token_name($token[0]) . ', ';
                }
            },
            token_get_all($string)
        );
        echo PHP_EOL;
    },
    array ('<?php $a = array (1);', '<?php $a = [1]')
);

Expected result:
----------------
T_OPEN_TAG, T_VARIABLE, T_WHITESPACE, T_WHITESPACE, T_ARRAY, T_WHITESPACE, T_LNUMBER, 
T_OPEN_TAG, T_VARIABLE, T_WHITESPACE, T_WHITESPACE, T_ARRAY, T_WHITESPACE, T_LNUMBER, 

Actual result:
--------------
T_OPEN_TAG, T_VARIABLE, T_WHITESPACE, T_WHITESPACE, T_ARRAY, T_WHITESPACE, T_LNUMBER, 
T_OPEN_TAG, T_VARIABLE, T_WHITESPACE, T_WHITESPACE, T_LNUMBER, 

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-02-05 20:31 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2014-02-05 20:31 UTC] nikic@php.net
T_ARRAY represents the keyword "array", not the semantic concept of an array. The '[' will be provided like any other single-character token.
 [2014-02-05 20:51 UTC] bishop@php.net
Uhm, that's my point. I expect T_ARRAY no longer means "array" literal, but the semantic concept of "array syntax".  The documentation even references T_ARRAY to "array syntax".

If "[" is just a character literal, indistinguishable in its use of array syntax, then how does one translate "[1, 2, 3]" back to "array (1, 2, 3)" like in this post: http://stackoverflow.com/questions/21500628/porting-php-5-4-to-5-3
 [2014-02-05 20:58 UTC] nikic@php.net
token_get_all exposes PHP's tokenizer/lexer, which is a component those only job is to convert a character stream into a token stream. It does little more than slap a label on certain character sequences. It does not have the ability to distinguish the '[' of an array declaration and the '[' of an array access.

If you need this kind of semantic information you should use a PHP parser.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Oct 31 23:01:28 2024 UTC