php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #48446 Tokenizer reports two T_INLINE_HTML at tags starting with s
Submitted: 2009-06-01 16:48 UTC Modified: 2009-06-01 17:32 UTC
From: shawn at shawnbiddle dot com Assigned:
Status: Closed Package: Scripting Engine problem
PHP Version: 5.2.9 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: shawn at shawnbiddle dot com
New email:
PHP Version: OS:

 

 [2009-06-01 16:48 UTC] shawn at shawnbiddle dot com
Description:
------------
If token_get_all is run on a script that contains both PHP and HTML it will split T_INLINE_HTML tokens up any time it runs across an html tag starting with s. My example uses span but as I said, it's any tag starting with s.

Reproduce code:
---------------
<?php
   print_r(token_get_all('<?php echo "Hello World!"; ?><h6>Hello</h6><span class="test">Hello!</span><?php echo "Goodbye, World!"; ?>'));
 ?>

Expected result:
----------------
Array
(
  [0] => Array
      (
          [0] => 367
          [1] => <?php
          [2] => 1
      )

  [1] => Array
      (
          [0] => 316
          [1] => echo
          [2] => 1
      )

  [2] => Array
      (
          [0] => 370
          [1] =>
          [2] => 1
      )

  [3] => Array
      (
          [0] => 315
          [1] => "Hello World!"
          [2] => 1
      )

  [4] => ;
  [5] => Array
      (
          [0] => 370
          [1] =>
          [2] => 1
      )

  [6] => Array
      (
          [0] => 369
          [1] => ?>
          [2] => 1
      )

  [7] => Array
      (
          [0] => 311
          [1] => <h6>Hello</h6><span class="test">Hello!</span>
          [2] => 1
      )
  [8] => Array
      (
          [0] => 367
          [1] => <?php
          [2] => 1
      )

  [9] => Array
      (
          [0] => 316
          [1] => echo
          [2] => 1
      )

  [10] => Array
      (
          [0] => 370
          [1] =>
          [2] => 1
      )

  [11] => Array
      (
          [0] => 315
          [1] => "Goodbye, World!"
          [2] => 1
      )

  [12] => ;
  [13] => Array
      (
          [0] => 370
          [1] =>
          [2] => 1
      )

  [14] => Array
      (
          [0] => 369
          [1] => ?>
          [2] => 1
      )
)


Actual result:
--------------
Array
(
    [0] => Array
        (
            [0] => 367
            [1] => <?php
            [2] => 1
        )

    [1] => Array
        (
            [0] => 316
            [1] => echo
            [2] => 1
        )

    [2] => Array
        (
            [0] => 370
            [1] =>
            [2] => 1
        )

    [3] => Array
        (
            [0] => 315
            [1] => "Hello World!"
            [2] => 1
        )

    [4] => ;
    [5] => Array
        (
            [0] => 370
            [1] =>
            [2] => 1
        )

    [6] => Array
        (
            [0] => 369
            [1] => ?>
            [2] => 1
        )

    [7] => Array
        (
            [0] => 311
            [1] => <h6>Hello</h6>
            [2] => 1
        )

    [8] => Array
        (
            [0] => 311
            [1] => <s
            [2] => 1
        )

    [9] => Array
        (
            [0] => 311
            [1] => pan class="test">Hello!</span>
            [2] => 1
        )

    [10] => Array
        (
            [0] => 367
            [1] => <?php
            [2] => 1
        )

    [11] => Array
        (
            [0] => 316
            [1] => echo
            [2] => 1
        )

    [12] => Array
        (
            [0] => 370
            [1] =>
            [2] => 1
        )

    [13] => Array
        (
            [0] => 315
            [1] => "Goodbye, World!"
            [2] => 1
        )

    [14] => ;
    [15] => Array
        (
            [0] => 370
            [1] =>
            [2] => 1
        )

    [16] => Array
        (
            [0] => 369
            [1] => ?>
            [2] => 1
        )

)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-06-01 17:16 UTC] mattwil@php.net
Yeah, that's just how the tokenizer/scanner has always worked. It stops at "<s" to avoid a long PHP opening tag (e.g. <script language="php">, etc.) from being taken as inline HTML. The regular expressions in the scanner can't "look ahead" to make sure what follows is NOT a PHP opening tag, and it would be more complicated, if it's even possible (been awhile since I looked), to do extra checking in the code after scanning additional input...

The good news, however, is that a new scanner is used for PHP 5.3, and as of a few weeks ago (5.3.0 RC2), it now works as you'd expect. All continuous HTML is kept as one token. :-)
 [2009-06-01 17:32 UTC] shawn at shawnbiddle dot com
Gotcha, guess I'll have to hackity-hack-hack around it until 5.3 is released stable.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sat Aug 02 03:00:02 2025 UTC