php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73817 Incorrect entries in get_html_translation_table
Submitted: 2016-12-26 21:55 UTC Modified: 2018-07-15 21:21 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: philip at van-diemen-de-jel dot nl Assigned: cmb (profile)
Status: Closed Package: Strings related
PHP Version: 5.6.29 OS: Linux
Private report: No CVE-ID: None
 [2016-12-26 21:55 UTC] philip at van-diemen-de-jel dot nl
Description:
------------
PHP-version: 5.6.29-0+deb8u1 (on Linux plesk4.duocast.net 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64)

Some entries obtained from get_html_translation_table have a missing closing semicolon.
E.g. [8402] => &nvlt  (should be <⃒)
[8421] => &bne
etc.

If you run the test script below (not the most sophisticated, but a quicky), the invalid entries will be in red.

Test script:
---------------
$vec = get_html_translation_table( HTML_ENTITIES, ENT_QUOTES | ENT_HTML5);
foreach ($vec as $key => $value) {
   $dec = Utf2code( $key );
   if (substr($value,-1,1)!=';')  echo "<br>[$dec] => &amp;<span class=red>".substr($value,1)."</span> = $value;";
   else echo "<br>[$dec] => &amp;".substr($value,1)." = $value";
}

  function Utf2code( $latin ) {
    $magic = 128;
    $i = $cnt = 0;
    while ($i < strlen($latin) ) {
  		$kar = substr( $latin, $i, 1 );
      $num = ord( $kar );
      if (strlen($latin)==1) return( $num );
      if ($num >= $magic) {
        $cnt++;
        $mask = $magic;  $len = $utf = 0;
        while ($mask & $num)  { $len++;  $mask = $mask/2; };
  			$utf = ($mask-1) & $num;
  			$j = 1;
  			while ($j < $len) {
          $kar = substr( $latin, $i+$j, 1 );
          if ((192 & ord( $kar )) != 128)  break;
  			  $utf = ($utf * 64) + (63 & ord($kar));
  			  $j++;
     		};
     		if (($j==$len) and ($len<=6)) {		//Assume UTF
          return( $utf );
        };
      };
      $i++;
    };
    return( -1 );
  };


Expected result:
----------------
A dump of the html (entity) translation table, with incomplete entries flagged in red.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-12-27 12:13 UTC] cmb@php.net
-Package: unicodestring +Package: Strings related
 [2016-12-27 12:13 UTC] cmb@php.net
Actual result: <http://www.twentebestand.nl/pub/Test.php>.
 [2016-12-27 13:54 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2018-07-15 21:21 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
 [2018-07-15 21:43 UTC] cmb@php.net
Automatic comment on behalf of cmbecker69@gmx.de
Revision: http://git.php.net/?p=php-src.git;a=commit;h=0f8c1ee76d5d2aa90bf667215e41eed60ca6cbcd
Log: Fix #73817: Incorrect entries in get_html_translation_table
 [2018-07-15 21:43 UTC] cmb@php.net
-Status: Verified +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 09:01:32 2024 UTC