php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46817 tokenizer misses last single-line comment (PHP 5.3+, with re2c lexer)
Submitted: 2008-12-09 22:35 UTC Modified: 2010-11-22 13:39 UTC
Votes:5
Avg. Score:3.4 ± 1.4
Reproduced:4 of 4 (100.0%)
Same Version:4 (100.0%)
Same OS:2 (50.0%)
From: master dot jexus at gmail dot com Assigned: shire (profile)
Status: Closed Package: Scripting Engine problem
PHP Version: 5.3.0alpha3 OS: *
Private report: No CVE-ID: None
 [2008-12-09 22:35 UTC] master dot jexus at gmail dot com
Description:
------------
When using the tokenizer to lex given text, the output seems to miss 
the last token, if it was a single line comment.

It only seems to occur if there isn't a newline behind the comment 
lexeme.

Note the last entries in the arrays.

Reproduce code:
---------------
<?php
print_r(token_get_all(file_get_contents(__FILE__)));

// test
$var = 5;
// test

Expected result:
----------------
Array
(
    [0] => Array
        (
            [0] => 367
            [1] =>  1
        )
 
    [1] => Array
        (
            [0] => 307
            [1] => print_r
            [2] => 2
        )
 
    [2] => (
    [3] => Array
        (
            [0] => 307
            [1] => token_get_all
            [2] => 2
        )
 
    [4] => (
    [5] => Array
        (
            [0] => 307
            [1] => file_get_contents
            [2] => 2
        )
 
    [6] => (
    [7] => Array
        (
            [0] => 364
            [1] => __FILE__
            [2] => 2
        )
 
    [8] => )
    [9] => )
    [10] => )
    [11] => ;
    [12] => Array
        (
            [0] => 370
            [1] => 
 
 
            [2] => 2
        )
 
    [13] => Array
        (
            [0] => 365
            [1] => // test
 
            [2] => 4
        )
 
    [14] => Array
        (
            [0] => 309
            [1] => $var
            [2] => 5
        )
 
    [15] => Array
        (
            [0] => 370
            [1] =>  
            [2] => 5
        )
 
    [16] => =
    [17] => Array
        (
            [0] => 370
            [1] =>  
            [2] => 5
        )
 
    [18] => Array
        (
            [0] => 305
            [1] => 5
            [2] => 5
        )
 
    [19] => ;
    [20] => Array
        (
            [0] => 370
            [1] => 
 
            [2] => 5
        )
 
    [21] => Array
        (
            [0] => 365
            [1] => // test
            [2] => 6
        )
 
)

Actual result:
--------------
Array
(
    [0] => Array
        (
            [0] => 368
            [1] =>  1
        )
 
    [1] => Array
        (
            [0] => 307
            [1] => print_r
            [2] => 2
        )
 
    [2] => (
    [3] => Array
        (
            [0] => 307
            [1] => token_get_all
            [2] => 2
        )
 
    [4] => (
    [5] => Array
        (
            [0] => 307
            [1] => file_get_contents
            [2] => 2
        )
 
    [6] => (
    [7] => Array
        (
            [0] => 365
            [1] => __FILE__
            [2] => 2
        )
 
    [8] => )
    [9] => )
    [10] => )
    [11] => ;
    [12] => Array
        (
            [0] => 371
            [1] => 
 
 
            [2] => 2
        )
 
    [13] => Array
        (
            [0] => 366
            [1] => // test
 
            [2] => 4
        )
 
    [14] => Array
        (
            [0] => 309
            [1] => $var
            [2] => 5
        )
 
    [15] => Array
        (
            [0] => 371
            [1] =>  
            [2] => 5
        )
 
    [16] => =
    [17] => Array
        (
            [0] => 371
            [1] =>  
            [2] => 5
        )
 
    [18] => Array
        (
            [0] => 305
            [1] => 5
            [2] => 5
        )
 
    [19] => ;
    [20] => Array
        (
            [0] => 371
            [1] => 
 
            [2] => 5
        )
 
)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-12-10 10:25 UTC] nlopess@php.net
this is a problem in the new lexer. The problem is reproduceable if after the comment there's the EOF (with no \n after the comment).
This, again, is triggered because of the difference in handling the EOF between flex and re2c..
A simple hack would be to detect the ST_ONE_LINE_COMMENT state on EOF and return the correct value, but I would prefer a more general thing.
 [2009-03-06 07:41 UTC] lucas@php.net
I'm seeing what could be related if not the same problem trying to detect trailing windows CR+LF in T_WHITESPACE:

Reproduce code:
---------------
<?php
// this comment and trailing blank contain windows CR+LF^M
^M

Expected result:
----------------
array(3) {
  [0]=>
  array(3) {
    [0]=>
    int(367)
    [1]=>
    string(6) "<?php
"
    [2]=>
    int(1)
  }
  [1]=>
  array(3) {
    [0]=>
    int(365)
    [1]=>
"   string(57) "// this comment and trailing blank contain windows CR+LF
    [2]=>
    int(2)
  }
  [2]=>
  array(3) {
    [0]=>
    int(370)
    [1]=>
    string(3) "

"
    int(2)
  }
}

    [2]=>
    int(2)
  }
}

Actual result:
--------------
array(2) {
  [0]=>
  array(3) {
    [0]=>
    int(368)
    [1]=>
    string(6) "<?php
"
    [2]=>
    int(1)
  }
  [1]=>
  array(3) {
    [0]=>
    int(366)
    [1]=>
"   string(57) "// this comment and trailing blank contain windows CR+LF
    [2]=>
    int(2)
  }
}
 [2009-03-11 22:18 UTC] shire@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 [2010-11-22 13:39 UTC] felipe@php.net
-Block user comment: N +Block user comment: Y
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 08:01:29 2024 UTC