php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #35286 tokenizer ext drops final comment
Submitted: 2005-11-19 01:53 UTC Modified: 2005-11-21 22:32 UTC
From: cellog@php.net Assigned: helly (profile)
Status: Closed Package: Scripting Engine problem
PHP Version: 5CVS-2005-11-19 (cvs) OS: *
Private report: No CVE-ID: None
 [2005-11-19 01:53 UTC] cellog@php.net
Description:
------------
The tokenizer extension is ignoring a final comment when tokenizing a script that does not contain a closing ?>

Reproduce code:
---------------
var_dump(token_get_all("<?php print 'foo'; # you'll see it
 print 'bar'; # but not this one"));

Expected result:
----------------
array(13) {
  [0]=>
  array(2) {
    [0]=>
    int(366)
    [1]=>
    string(6) "<?php "
  }
  [1]=>
  array(2) {
    [0]=>
    int(266)
    [1]=>
    string(5) "print"
  }
  [2]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [3]=>
  array(2) {
    [0]=>
    int(315)
    [1]=>
    string(5) "'foo'"
  }
  [4]=>
  string(1) ";"
  [5]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [6]=>
  array(2) {
    [0]=>
    int(364)
    [1]=>
    string(16) "# you'll see it
"
  }
  [7]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [8]=>
  array(2) {
    [0]=>
    int(266)
    [1]=>
    string(5) "print"
  }
  [9]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [10]=>
  array(2) {
    [0]=>
    int(315)
    [1]=>
    string(5) "'bar'"
  }
  [11]=>
  string(1) ";"
  [12]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [13]=>
  array(2) {
    [0]=>
    int(364)
    [1]=>
    string(18) "# but not this one"
}


Actual result:
--------------
array(13) {
  [0]=>
  array(2) {
    [0]=>
    int(366)
    [1]=>
    string(6) "<?php "
  }
  [1]=>
  array(2) {
    [0]=>
    int(266)
    [1]=>
    string(5) "print"
  }
  [2]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [3]=>
  array(2) {
    [0]=>
    int(315)
    [1]=>
    string(5) "'foo'"
  }
  [4]=>
  string(1) ";"
  [5]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [6]=>
  array(2) {
    [0]=>
    int(364)
    [1]=>
    string(16) "# you'll see it
"
  }
  [7]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [8]=>
  array(2) {
    [0]=>
    int(266)
    [1]=>
    string(5) "print"
  }
  [9]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
  [10]=>
  array(2) {
    [0]=>
    int(315)
    [1]=>
    string(5) "'bar'"
  }
  [11]=>
  string(1) ";"
  [12]=>
  array(2) {
    [0]=>
    int(369)
    [1]=>
    string(1) " "
  }
}


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-11-19 02:04 UTC] tony2001@php.net
Doesn't drop anything here:

The code:

<?php

$arr = token_get_all("<?php 
print 'foo'; 
# you'll see it
print 'bar'; 
# but not this one
");

foreach ($arr as $token) {
	if (is_array($token)) var_dump($token[1]);
}
?>

The output:

string(6) "<?php "
string(1) "
"
string(5) "print"
string(1) " "
string(5) "'foo'"
string(2) "
"
string(16) "# you'll see it
"
string(5) "print"
string(1) " "
string(5) "'bar'"
string(2) "
"
string(19) "# but not this one
"

 [2005-11-19 02:10 UTC] cellog@php.net
the original reproduce script has no closing newline - the newline does change the behavior
 [2005-11-19 04:11 UTC] cellog@php.net
This patch fixes the issue with no negative side effects

Index: zend_language_scanner.l
===================================================================
RCS file: /repository/ZendEngine2/zend_language_scanner.l,v
retrieving revision 1.131.2.3
diff -u -r1.131.2.3 zend_language_scanner.l
--- zend_language_scanner.l     15 Nov 2005 13:29:28 -0000      1.131.2.3
+++ zend_language_scanner.l     19 Nov 2005 03:11:22 -0000
@@ -1465,6 +1465,7 @@
        yymore();
 }

+<ST_ONE_LINE_COMMENT><<EOF>>   |
 <ST_ONE_LINE_COMMENT>{NEWLINE} {
        zendlval->value.str.val = yytext; /* no copying - intentional */
        zendlval->value.str.len = yyleng;

 [2005-11-19 06:53 UTC] cellog@php.net
better patch - this one does not increment CG(zend_lineno)

Index: zend_language_scanner.l
===================================================================
RCS file: /repository/ZendEngine2/zend_language_scanner.l,v
retrieving revision 1.131.2.3
diff -u -r1.131.2.3 zend_language_scanner.l
--- zend_language_scanner.l     15 Nov 2005 13:29:28 -0000      1.131.2.3
+++ zend_language_scanner.l     19 Nov 2005 05:52:01 -0000
@@ -1465,6 +1465,12 @@
        yymore();
 }

+<ST_ONE_LINE_COMMENT><<EOF>> {
+       zendlval->value.str.val = yytext; /* no copying - intentional */
+       zendlval->value.str.len = yyleng;
+       zendlval->type = IS_STRING;
+       return T_COMMENT;
+}
 <ST_ONE_LINE_COMMENT>{NEWLINE} {
        zendlval->value.str.val = yytext; /* no copying - intentional */
        zendlval->value.str.len = yyleng;

 [2005-11-19 10:43 UTC] helly@php.net
Fixed in head
 [2005-11-21 22:32 UTC] iliaa@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 09:01:29 2024 UTC