php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #35286 tokenizer ext drops final comment
Submitted: 2005-11-19 01:53 UTC Modified: 2005-11-21 22:32 UTC
From: cellog@php.net Assigned: helly (profile)
Status: Closed Package: Scripting Engine problem
PHP Version: 5CVS-2005-11-19 (cvs) OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: cellog@php.net
New email:
PHP Version: OS:

 

 [2005-11-19 01:53 UTC] cellog@php.net
Description:
------------
The tokenizer extension is ignoring a final comment when tokenizing a script that does not contain a closing ?>

Reproduce code:
---------------
var_dump(token_get_all("<?php print 'foo'; # you'll see it
 print 'bar'; # but not this one"));

Expected result:
----------------
array(13) {
  [0]=>
  array(2) {
    [0]=>
    int(366)
    [1]=>
    string(6) "<?php "
  }
  [1]=>
  array(2) {
    [0]=>
    int(266)
    [1]=>
    string(5) "print"
  }
  [2]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [3]=>
  array(2) {
    [0]=>
    int(315)
    [1]=>
    string(5) "'foo'"
  }
  [4]=>
  string(1) ";"
  [5]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [6]=>
  array(2) {
    [0]=>
    int(364)
    [1]=>
    string(16) "# you'll see it
"
  }
  [7]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [8]=>
  array(2) {
    [0]=>
    int(266)
    [1]=>
    string(5) "print"
  }
  [9]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [10]=>
  array(2) {
    [0]=>
    int(315)
    [1]=>
    string(5) "'bar'"
  }
  [11]=>
  string(1) ";"
  [12]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [13]=>
  array(2) {
    [0]=>
    int(364)
    [1]=>
    string(18) "# but not this one"
}


Actual result:
--------------
array(13) {
  [0]=>
  array(2) {
    [0]=>
    int(366)
    [1]=>
    string(6) "<?php "
  }
  [1]=>
  array(2) {
    [0]=>
    int(266)
    [1]=>
    string(5) "print"
  }
  [2]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [3]=>
  array(2) {
    [0]=>
    int(315)
    [1]=>
    string(5) "'foo'"
  }
  [4]=>
  string(1) ";"
  [5]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [6]=>
  array(2) {
    [0]=>
    int(364)
    [1]=>
    string(16) "# you'll see it
"
  }
  [7]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [8]=>
  array(2) {
    [0]=>
    int(266)
    [1]=>
    string(5) "print"
  }
  [9]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [10]=>
  array(2) {
    [0]=>
    int(315)
    [1]=>
    string(5) "'bar'"
  }
  [11]=>
  string(1) ";"
  [12]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
}


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-11-19 02:04 UTC] tony2001@php.net
Doesn't drop anything here:

The code:

<?php

$arr = token_get_all("<?php 
print 'foo'; 
# you'll see it
print 'bar'; 
# but not this one
");

foreach ($arr as $token) {
	if (is_array($token)) var_dump($token[1]);
}
?>

The output:

string(6) "<?php "
string(1) "
"
string(5) "print"
string(1) " "
string(5) "'foo'"
string(2) "
"
string(16) "# you'll see it
"
string(5) "print"
string(1) " "
string(5) "'bar'"
string(2) "
"
string(19) "# but not this one
"

 [2005-11-19 02:10 UTC] cellog@php.net
the original reproduce script has no closing newline - the newline does change the behavior
 [2005-11-19 04:11 UTC] cellog@php.net
This patch fixes the issue with no negative side effects

Index: zend_language_scanner.l
===================================================================
RCS file: /repository/ZendEngine2/zend_language_scanner.l,v
retrieving revision 1.131.2.3
diff -u -r1.131.2.3 zend_language_scanner.l
--- zend_language_scanner.l     15 Nov 2005 13:29:28 -0000      1.131.2.3
+++ zend_language_scanner.l     19 Nov 2005 03:11:22 -0000
@@ -1465,6 +1465,7 @@
        yymore();
 }

+<ST_ONE_LINE_COMMENT><<EOF>>   |
 <ST_ONE_LINE_COMMENT>{NEWLINE} {
        zendlval->value.str.val = yytext; /* no copying - intentional */
        zendlval->value.str.len = yyleng;

 [2005-11-19 06:53 UTC] cellog@php.net
better patch - this one does not increment CG(zend_lineno)

Index: zend_language_scanner.l
===================================================================
RCS file: /repository/ZendEngine2/zend_language_scanner.l,v
retrieving revision 1.131.2.3
diff -u -r1.131.2.3 zend_language_scanner.l
--- zend_language_scanner.l     15 Nov 2005 13:29:28 -0000      1.131.2.3
+++ zend_language_scanner.l     19 Nov 2005 05:52:01 -0000
@@ -1465,6 +1465,12 @@
        yymore();
 }

+<ST_ONE_LINE_COMMENT><<EOF>> {
+       zendlval->value.str.val = yytext; /* no copying - intentional */
+       zendlval->value.str.len = yyleng;
+       zendlval->type = IS_STRING;
+       return T_COMMENT;
+}
 <ST_ONE_LINE_COMMENT>{NEWLINE} {
        zendlval->value.str.val = yytext; /* no copying - intentional */
        zendlval->value.str.len = yyleng;

 [2005-11-19 10:43 UTC] helly@php.net
Fixed in head
 [2005-11-21 22:32 UTC] iliaa@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sat Jul 05 04:01:35 2025 UTC