php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #25927 get_html_translation_table calls the ' ' instead of '
Submitted: 2003-10-20 17:53 UTC Modified: 2010-10-12 04:52 UTC
Votes:3
Avg. Score:4.7 ± 0.5
Reproduced:2 of 2 (100.0%)
Same Version:2 (100.0%)
Same OS:1 (50.0%)
From: acm at tweakers dot net Assigned: cataphract (profile)
Status: Closed Package: Unknown/Other Function
PHP Version: 4.3.3 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: acm at tweakers dot net
New email:
PHP Version: OS:

 

 [2003-10-20 17:53 UTC] acm at tweakers dot net
Description:
------------
When you call get_html_translation_table, with the ENT_QUOTES parameter, it'll return ' for '

The code for ' should, of course, be '

This was not broken in 4.3.1, so is newly introduced in either 4.3.2 or 4.3.3

One wonders how this could occur, since both htmlspecialchars/htmlentities and html_entity_decode work correctly.

Reproduce code:
---------------
<?        print_r(get_html_translation_table(HTML_SPECIALCHARS,ENT_QUOTES));
?>

Expected result:
----------------
Array
(
    [&] => &amp;
    ["] => &quot;
    ['] => &#039;
    [<] => &lt;
    [>] => &gt;
)


Actual result:
--------------
Array
(
    [&] => &amp;
    ["] => &quot;
    ['] => &#39;
    [<] => &lt;
    [>] => &gt;
)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-10-20 18:35 UTC] kees at tweakers dot net
We've fixed it be commenting line 421 of ext/standard/html.c:

420-    { '\'', "&#039;",       6,      ENT_HTML_QUOTE_SINGLE },
421:/*  { '\'', "&#39;",        5,      ENT_HTML_QUOTE_SINGLE }, */
 [2003-10-20 18:55 UTC] iliaa@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Both &#39; and &#039; are valid.
 [2003-10-20 19:04 UTC] acm at tweakers dot net
Well, it's cute that both are valid, but that's not the point...
get_html_translation_table is supposed to return "how php's functions translate it", not "any way which is valid".

And in that way, it _fails_ to do so.
Since the function html_entity_decode is only available as of php-4.3.0, anyone who has a similar function (based on the php-example on the documentpage!), finds it broken because of this.

In that sense it is, imho, a bug.
Quoting you're own documentation:
"get_html_translation_table --  Returns the translation table used by htmlspecialchars() and htmlentities()"
 [2003-10-20 21:51 UTC] moriyoshi@php.net
Not quite. When you have to write your own html_entity_decode(), you should cope with any forms of the numeric entity including hexadecimal style. It's not as simple as the snippet in the manual page.

 [2003-10-21 05:14 UTC] acm at tweakers dot net
Well, maybe so.

But I was refering to a function that tries to undo the changes of htmlspecialchars/htmlentities.

If htmlspecialchars changes ' to &#039; and you want to depend on get_html_translation_table to undo all changes, you expect it to return ' = &#039; instead of ' = &#39;, since that's the change htmlspecialchars/htmlentities did aswell.
It didn't change it to &#39;

If you really wanted to create a perfect entity-decoder, you'd indeed have to cope with all those &*; entities, including all the &#[0-9]{2,3};-like entities.

But for the simple "undo the htmlspecialchars"-like function that is not necessary.

And again, get_html_translation_table returns "how the htmlspecialchars/entities functions do it", not "all possible translations" or "just a valid version, maybe not what our own functions do", doesn't it? :)

To explain what I mean:
if you do 
echo html_entity_decode(htmlspecialchars("'", ENT_QUOTES));
you get ' back.

If you do:
function my_entity_decoder($string)
{
$trans = array_flip(get_html_translation_table(ENT_HTML_SPECIALCHARS, ENT_QUOTES));
$original = strtr($encoded, $trans);
}

echo my_entity_decoder(htmlspecialchars("'", ENT_QUOTES));
Where you trust the get_html_translation_table-function to return enough information to output ' again...

But if it all doesn't matter to you guys, why do the two change at all?
Why does the htmlspecialchars change it to &#039; why the get_html_translation_table claims it changes it to &#39; ??
 [2003-11-20 14:00 UTC] mike-php at emerge2 dot com
Does the same in Windows PHP 4.3.4.
 [2010-10-11 07:15 UTC] cataphract@php.net
-Status: Bogus +Status: Re-Opened -Assigned To: +Assigned To: cataphract
 [2010-10-12 04:51 UTC] cataphract@php.net
Automatic comment from SVN on behalf of cataphract
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=304340
Log: - Added a 3rd parameter to get_html_translation_table. It now takes a charset
  hint, like htmlentities et al.
- Fixed bug #49407 (get_html_translation_table doesn't handle UTF-8).
- Fixed bug #25927 (get_html_translation_table calls the ' &amp;#39; instead of
  &amp;#039;).
- Fixed tests for get_html_translation_table and unified the Windows and
  non-Windows versions of the tests.
 [2010-10-12 04:52 UTC] cataphract@php.net
-Status: Re-Opened +Status: Closed
 [2010-10-12 04:52 UTC] cataphract@php.net
Fixed for PHP 5.3 and trunk.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 14:01:29 2024 UTC