php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73817 Incorrect entries in get_html_translation_table
Submitted: 2016-12-26 21:55 UTC Modified: 2018-07-15 21:21 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: philip at van-diemen-de-jel dot nl Assigned: cmb (profile)
Status: Closed Package: Strings related
PHP Version: 5.6.29 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: philip at van-diemen-de-jel dot nl
New email:
PHP Version: OS:

 

 [2016-12-26 21:55 UTC] philip at van-diemen-de-jel dot nl
Description:
------------
PHP-version: 5.6.29-0+deb8u1 (on Linux plesk4.duocast.net 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64)

Some entries obtained from get_html_translation_table have a missing closing semicolon.
E.g. [8402] => &nvlt  (should be <⃒)
[8421] => &bne
etc.

If you run the test script below (not the most sophisticated, but a quicky), the invalid entries will be in red.

Test script:
---------------
$vec = get_html_translation_table( HTML_ENTITIES, ENT_QUOTES | ENT_HTML5);
foreach ($vec as $key => $value) {
   $dec = Utf2code( $key );
   if (substr($value,-1,1)!=';')  echo "<br>[$dec] => &amp;<span class=red>".substr($value,1)."</span> = $value;";
   else echo "<br>[$dec] => &amp;".substr($value,1)." = $value";
}

  function Utf2code( $latin ) {
    $magic = 128;
    $i = $cnt = 0;
    while ($i < strlen($latin) ) {
  		$kar = substr( $latin, $i, 1 );
      $num = ord( $kar );
      if (strlen($latin)==1) return( $num );
      if ($num >= $magic) {
        $cnt++;
        $mask = $magic;  $len = $utf = 0;
        while ($mask & $num)  { $len++;  $mask = $mask/2; };
  			$utf = ($mask-1) & $num;
  			$j = 1;
  			while ($j < $len) {
          $kar = substr( $latin, $i+$j, 1 );
          if ((192 & ord( $kar )) != 128)  break;
  			  $utf = ($utf * 64) + (63 & ord($kar));
  			  $j++;
     		};
     		if (($j==$len) and ($len<=6)) {		//Assume UTF
          return( $utf );
        };
      };
      $i++;
    };
    return( -1 );
  };


Expected result:
----------------
A dump of the html (entity) translation table, with incomplete entries flagged in red.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-12-27 12:13 UTC] cmb@php.net
-Package: unicodestring +Package: Strings related
 [2016-12-27 12:13 UTC] cmb@php.net
Actual result: <http://www.twentebestand.nl/pub/Test.php>.
 [2016-12-27 13:54 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2018-07-15 21:21 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
 [2018-07-15 21:43 UTC] cmb@php.net
Automatic comment on behalf of cmbecker69@gmx.de
Revision: http://git.php.net/?p=php-src.git;a=commit;h=0f8c1ee76d5d2aa90bf667215e41eed60ca6cbcd
Log: Fix #73817: Incorrect entries in get_html_translation_table
 [2018-07-15 21:43 UTC] cmb@php.net
-Status: Verified +Status: Closed
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Tue Feb 04 01:01:28 2025 UTC