php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46467 htmlentities() does not use default_charset when mbstring.so is loaded
Submitted: 2008-11-02 22:46 UTC Modified: 2008-11-09 22:45 UTC
From: hostmaster at uuism dot net Assigned:
Status: Not a bug Package: mbstring related
PHP Version: 5.2.6 OS: Fedora Core 4
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: hostmaster at uuism dot net
New email:
PHP Version: OS:

 

 [2008-11-02 22:46 UTC] hostmaster at uuism dot net
Description:
------------
When I run test scripts ext/standard/tests/strings/htmlentitiesXX.phpt where XX is 10, 11, and 13, the tests fail when mbstring.so is loaded and pass when mbstring.so is not loaded.

If I change the test script INI value for mbstring.internal_encoding to the default_charset, the test will pass.  Also, they will pass of the $charset is included in the command htmlentities.

I am using the php.ini-recommended with extension_dir = "/usr/local/src/php-5.2.6/modules"  I placed the php.ini in directory /usr/local/src/php-5.2.6/sapi/cli so sapi/cli/php would use it.

All the php.ini values for mbstring are commented out:

# grep mbstring /usr/local/src/php-5.2.6/sapi/cli/php.ini                                               ;extension=php_mbstring.dll
[mbstring]
;mbstring.language = Japanese
;mbstring.internal_encoding = EUC-JP
;mbstring.http_input = auto
;mbstring.http_output = SJIS
; mbstring.internal_encoding setting. Input chars are
;mbstring.encoding_translation = Off
;mbstring.detect_order = auto
;mbstring.substitute_character = none;
; overload(replace) single byte functions by mbstring functions.
;mbstring.func_overload = 0
;mbstring.strict_encoding = Off
; With mbstring support this will automatically be converted into the encoding
; given by corresponding encode setting. When empty mbstring.internal_encoding

Here is the run without mbstring.so loaded:

]# TEST_PHP_EXECUTABLE=sapi/cli/php sapi/cli/php ./run-tests.php ext/standard/tests/strings/htmlentities*phpt

=====================================================================
PHP         : sapi/cli/php
PHP_SAPI    : cli
PHP_VERSION : 5.2.6
ZEND_VERSION: 2.2.0
PHP_OS      : Linux - Linux host.uuserver.net 2.6.20.1 #16 SMP Thu Nov 8 14:19:44 EST 2007 i686
INI actual  : /usr/local/src/php-5.2.6/sapi/cli/php.ini
More .INIs  : /etc/php.d/mysql.ini,/etc/php.d/mysqli.ini
CWD         : /usr/local/src/php-5.2.6
Extra dirs  :
=====================================================================
Running selected tests.
[snip]
PASS htmlentities() test 10 (default_charset / cp1252) [ext/standard/tests/strings/htmlentities10.phpt]
PASS htmlentities() test 11 (default_charset / ISO-8859-15) [ext/standard/tests/strings/htmlentities11.phpt]
PASS htmlentities() test 12 (default_charset / ISO-8859-1) [ext/standard/tests/strings/htmlentities12.phpt]
PASS htmlentities() test 13 (default_charset / EUC-JP) [ext/standard/tests/strings/htmlentities13.phpt]
[snip]
=====================================================================
Number of tests :   20                14
Tests skipped   :    6 ( 30.0%) --------
Tests warned    :    0 (  0.0%) (  0.0%)
Tests failed    :    0 (  0.0%) (  0.0%)
Tests passed    :   14 ( 70.0%) (100.0%)
---------------------------------------------------------------------
Time taken      :    1 seconds
=====================================================================

Here is the run WITH mbstring.so loaded:

# TEST_PHP_EXECUTABLE=sapi/cli/php sapi/cli/php ./run-tests.php ext/standard/tests/strings/htmlentities*phpt

=====================================================================
PHP         : sapi/cli/php
PHP_SAPI    : cli
PHP_VERSION : 5.2.6
ZEND_VERSION: 2.2.0
PHP_OS      : Linux - Linux host.uuserver.net 2.6.20.1 #16 SMP Thu Nov 8 14:19:44 EST 2007 i686
INI actual  : /usr/local/src/php-5.2.6/sapi/cli/php.ini
More .INIs  : /etc/php.d/mbstring.ini,/etc/php.d/mysql.ini,/etc/php.d/mysqli.ini
CWD         : /usr/local/src/php-5.2.6
Extra dirs  :
=====================================================================
Running selected tests.
[snip]
FAIL htmlentities() test 10 (default_charset / cp1252) [ext/standard/tests/strings/htmlentities10.phpt]
FAIL htmlentities() test 11 (default_charset / ISO-8859-15) [ext/standard/tests/strings/htmlentities11.phpt]
PASS htmlentities() test 12 (default_charset / ISO-8859-1) [ext/standard/tests/strings/htmlentities12.phpt]
FAIL htmlentities() test 13 (default_charset / EUC-JP) [ext/standard/tests/strings/htmlentities13.phpt]
[snip]
=====================================================================
Number of tests :   20                20
Tests skipped   :    0 (  0.0%) --------
Tests warned    :    3 ( 15.0%) ( 15.0%)
Tests failed    :    3 ( 15.0%) ( 15.0%)
Tests passed    :   14 ( 70.0%) ( 70.0%)
---------------------------------------------------------------------
Time taken      :    2 seconds
=====================================================================

=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
[snip]
htmlentities() test 10 (default_charset / cp1252) [ext/standard/tests/strings/htmlentities10.phpt]
htmlentities() test 11 (default_charset / ISO-8859-15) [ext/standard/tests/strings/htmlentities11.phpt]
htmlentities() test 13 (default_charset / EUC-JP) [ext/standard/tests/strings/htmlentities13.phpt]
[snip]
=====================================================================

Thanks for looking into this problem.

Jim

Reproduce code:
---------------
<?php
        ini_set('mbstring.internal_encoding','cp1252');
        ini_set('default_charset','cp1252');
        print ini_get('default_charset')."\n";
        var_dump(htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, ''));
        var_dump(htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, 'cp1252'));


        ini_set('mbstring.internal_encoding','pass');
        ini_set('default_charset','cp1252');
        print ini_get('default_charset')."\n";
        var_dump(htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, ''));
        var_dump(htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, 'cp1252'));
?>


Expected result:
----------------
cp1252
string(28) "&sbquo;&dagger;&trade;&Yuml;"
string(28) "&sbquo;&dagger;&trade;&Yuml;"
cp1252
string(28) "&sbquo;&dagger;&trade;&Yuml;"
string(28) "&sbquo;&dagger;&trade;&Yuml;"


Actual result:
--------------
cp1252
string(28) "&sbquo;&dagger;&trade;&Yuml;"
string(28) "&sbquo;&dagger;&trade;&Yuml;"
cp1252
string(4) ""
string(28) "&sbquo;&dagger;&trade;&Yuml;"


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-11-03 08:34 UTC] jani@php.net
Yes, they fail for everyone and especially with braindead libc's.
 [2008-11-04 01:01 UTC] hostmaster at uuism dot net
jani,

if they fail for everyone, should they have a --XFAIL-- section

frankly, based on pure logic, i would not expect this statement to use the default_charset

htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, '');

since '' has no meaning as a value for charset, htmlentites should use the same default as when default_charset='' which is ISO-8859-1

on the other hand, this statement should use the default_charset

htmlentities("\x82\x86\x99\x9f", ENT_QUOTES);

since the third parameter is optional.

jim
 [2008-11-04 12:37 UTC] moriyoshi@php.net
Regarding the input as encoded in default_charset can be over-assumption because the output being sent to the browser may differ from that of the PHP script.

 [2008-11-05 05:52 UTC] hostmaster at uuism dot net
moriyoshi,

sorry, i don't understand what you are saying.

i just don't see why charset='' should be assumed to be the same as charset=default_charset when executing htmlentities

thanks

jim
 [2008-11-09 16:41 UTC] moriyoshi@php.net
The above comment was intended as a reply to the following:
> on the other hand, this statement should use the default_charset

 [2008-11-09 22:45 UTC] hostmaster at uuism dot net
moriyoshi,

Okay.  I understand. 

Isn't it also an over-assumption to assume that the third argument charset equals default_charset in this statement?

htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, '');

Jim
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 04:01:28 2025 UTC