php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #34577 get_html_translation_table is missing a lot of xHTML entities
Submitted: 2005-09-21 11:35 UTC Modified: 2021-09-19 04:22 UTC
Votes:16
Avg. Score:4.6 ± 0.8
Reproduced:14 of 14 (100.0%)
Same Version:3 (21.4%)
Same OS:1 (7.1%)
From: mark_meredith at shaw dot ca Assigned: cmb (profile)
Status: No Feedback Package: Strings related
PHP Version: 5.0.5 OS: Mac OS X 10.4.2
Private report: No CVE-ID: None
 [2005-09-21 11:35 UTC] mark_meredith at shaw dot ca
Description:
------------
Some special characters, Symbols, and Latin-1 chararacters. 
Are 
missing from get_html_translation_table
(HTML_ENTITIES,ENT_QUOTES)

The xhtml Special character lsquo and rsquo (left and right 
single quote are missing for instance). And a lot of 
entities from the W3C provided xhtml-symbol.ent, xhtml-
special.ent, and xhtml-lat1.ent files are missing.

You can download these files from http://www.w3.org/TR/
xhtml1/dtds.html

It also seems strange how all of this was implemented 
considering there are three files of entities. There really 
should be 3 separate functions for converting these 
characters to their entity equivalents as well as one 
function that will convert all entities.

Reproduce code:
---------------
print('<pre>'.print_r(get_html_translation_table
(HTML_ENTITIES,ENT_QUOTES),1).'</pre>');

/*:NOTE: then compare to the contents of the entity files located at http://www.w3.org/TR/
xhtml1/dtds.html*/

Expected result:
----------------
Should have every known entity, and divisions between what 
type of entity it is.

Actual result:
--------------
Is missing a lot of entities like &rsquo; from xhtml-
special.ent

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-03-17 23:09 UTC] morgan at forsalebyowner dot com
Ran into this problem while trying to generate an xml document using the DomDocument object.  The DomDocument->saveXML gives an error when certain characters within the document, specially the right-single-quotation (U+2019) and several other characters.  An attempt to convert the entities to HTML using htmlentities failed because the characters are not in the html_translation_table.
 [2009-08-20 00:38 UTC] chiestand at salk dot edu
This is still a problem in PHP 5.2.5 on both my Linux and OS X 10.5 platforms.

In my case the problematic character was "ff", which crashed DOMDocument::saveXML() with error "xmlEscapeEntities:char out of range"
 [2016-12-30 23:07 UTC] cmb@php.net
-Package: Feature/Change Request +Package: Strings related
 [2021-09-09 13:30 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2021-09-09 13:30 UTC] cmb@php.net
This seems to be fixed as of PHP 5.4.0[1], or are some entities
still missing?

> The DomDocument->saveXML gives an error when certain characters
> within the document, specially the right-single-quotation (U+2019)
> and several other characters.

That shouldn't be an issue if the document encoding is UTF-8.

[1] <https://3v4l.org/sPH1k>
 [2021-09-19 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 12:01:31 2024 UTC