php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46467 htmlentities() does not use default_charset when mbstring.so is loaded
Submitted: 2008-11-02 22:46 UTC Modified: 2008-11-09 22:45 UTC
From: hostmaster at uuism dot net Assigned:
Status: Not a bug Package: mbstring related
PHP Version: 5.2.6 OS: Fedora Core 4
Private report: No CVE-ID: None
 [2008-11-02 22:46 UTC] hostmaster at uuism dot net
Description:
------------
When I run test scripts ext/standard/tests/strings/htmlentitiesXX.phpt where XX is 10, 11, and 13, the tests fail when mbstring.so is loaded and pass when mbstring.so is not loaded.

If I change the test script INI value for mbstring.internal_encoding to the default_charset, the test will pass.  Also, they will pass of the $charset is included in the command htmlentities.

I am using the php.ini-recommended with extension_dir = "/usr/local/src/php-5.2.6/modules"  I placed the php.ini in directory /usr/local/src/php-5.2.6/sapi/cli so sapi/cli/php would use it.

All the php.ini values for mbstring are commented out:

# grep mbstring /usr/local/src/php-5.2.6/sapi/cli/php.ini                                               ;extension=php_mbstring.dll
[mbstring]
;mbstring.language = Japanese
;mbstring.internal_encoding = EUC-JP
;mbstring.http_input = auto
;mbstring.http_output = SJIS
; mbstring.internal_encoding setting. Input chars are
;mbstring.encoding_translation = Off
;mbstring.detect_order = auto
;mbstring.substitute_character = none;
; overload(replace) single byte functions by mbstring functions.
;mbstring.func_overload = 0
;mbstring.strict_encoding = Off
; With mbstring support this will automatically be converted into the encoding
; given by corresponding encode setting. When empty mbstring.internal_encoding

Here is the run without mbstring.so loaded:

]# TEST_PHP_EXECUTABLE=sapi/cli/php sapi/cli/php ./run-tests.php ext/standard/tests/strings/htmlentities*phpt

=====================================================================
PHP         : sapi/cli/php
PHP_SAPI    : cli
PHP_VERSION : 5.2.6
ZEND_VERSION: 2.2.0
PHP_OS      : Linux - Linux host.uuserver.net 2.6.20.1 #16 SMP Thu Nov 8 14:19:44 EST 2007 i686
INI actual  : /usr/local/src/php-5.2.6/sapi/cli/php.ini
More .INIs  : /etc/php.d/mysql.ini,/etc/php.d/mysqli.ini
CWD         : /usr/local/src/php-5.2.6
Extra dirs  :
=====================================================================
Running selected tests.
[snip]
PASS htmlentities() test 10 (default_charset / cp1252) [ext/standard/tests/strings/htmlentities10.phpt]
PASS htmlentities() test 11 (default_charset / ISO-8859-15) [ext/standard/tests/strings/htmlentities11.phpt]
PASS htmlentities() test 12 (default_charset / ISO-8859-1) [ext/standard/tests/strings/htmlentities12.phpt]
PASS htmlentities() test 13 (default_charset / EUC-JP) [ext/standard/tests/strings/htmlentities13.phpt]
[snip]
=====================================================================
Number of tests :   20                14
Tests skipped   :    6 ( 30.0%) --------
Tests warned    :    0 (  0.0%) (  0.0%)
Tests failed    :    0 (  0.0%) (  0.0%)
Tests passed    :   14 ( 70.0%) (100.0%)
---------------------------------------------------------------------
Time taken      :    1 seconds
=====================================================================

Here is the run WITH mbstring.so loaded:

# TEST_PHP_EXECUTABLE=sapi/cli/php sapi/cli/php ./run-tests.php ext/standard/tests/strings/htmlentities*phpt

=====================================================================
PHP         : sapi/cli/php
PHP_SAPI    : cli
PHP_VERSION : 5.2.6
ZEND_VERSION: 2.2.0
PHP_OS      : Linux - Linux host.uuserver.net 2.6.20.1 #16 SMP Thu Nov 8 14:19:44 EST 2007 i686
INI actual  : /usr/local/src/php-5.2.6/sapi/cli/php.ini
More .INIs  : /etc/php.d/mbstring.ini,/etc/php.d/mysql.ini,/etc/php.d/mysqli.ini
CWD         : /usr/local/src/php-5.2.6
Extra dirs  :
=====================================================================
Running selected tests.
[snip]
FAIL htmlentities() test 10 (default_charset / cp1252) [ext/standard/tests/strings/htmlentities10.phpt]
FAIL htmlentities() test 11 (default_charset / ISO-8859-15) [ext/standard/tests/strings/htmlentities11.phpt]
PASS htmlentities() test 12 (default_charset / ISO-8859-1) [ext/standard/tests/strings/htmlentities12.phpt]
FAIL htmlentities() test 13 (default_charset / EUC-JP) [ext/standard/tests/strings/htmlentities13.phpt]
[snip]
=====================================================================
Number of tests :   20                20
Tests skipped   :    0 (  0.0%) --------
Tests warned    :    3 ( 15.0%) ( 15.0%)
Tests failed    :    3 ( 15.0%) ( 15.0%)
Tests passed    :   14 ( 70.0%) ( 70.0%)
---------------------------------------------------------------------
Time taken      :    2 seconds
=====================================================================

=====================================================================
FAILED TEST SUMMARY
---------------------------------------------------------------------
[snip]
htmlentities() test 10 (default_charset / cp1252) [ext/standard/tests/strings/htmlentities10.phpt]
htmlentities() test 11 (default_charset / ISO-8859-15) [ext/standard/tests/strings/htmlentities11.phpt]
htmlentities() test 13 (default_charset / EUC-JP) [ext/standard/tests/strings/htmlentities13.phpt]
[snip]
=====================================================================

Thanks for looking into this problem.

Jim

Reproduce code:
---------------
<?php
        ini_set('mbstring.internal_encoding','cp1252');
        ini_set('default_charset','cp1252');
        print ini_get('default_charset')."\n";
        var_dump(htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, ''));
        var_dump(htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, 'cp1252'));


        ini_set('mbstring.internal_encoding','pass');
        ini_set('default_charset','cp1252');
        print ini_get('default_charset')."\n";
        var_dump(htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, ''));
        var_dump(htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, 'cp1252'));
?>


Expected result:
----------------
cp1252
string(28) "&sbquo;&dagger;&trade;&Yuml;"
string(28) "&sbquo;&dagger;&trade;&Yuml;"
cp1252
string(28) "&sbquo;&dagger;&trade;&Yuml;"
string(28) "&sbquo;&dagger;&trade;&Yuml;"


Actual result:
--------------
cp1252
string(28) "&sbquo;&dagger;&trade;&Yuml;"
string(28) "&sbquo;&dagger;&trade;&Yuml;"
cp1252
string(4) ""
string(28) "&sbquo;&dagger;&trade;&Yuml;"


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-11-03 08:34 UTC] jani@php.net
Yes, they fail for everyone and especially with braindead libc's.
 [2008-11-04 01:01 UTC] hostmaster at uuism dot net
jani,

if they fail for everyone, should they have a --XFAIL-- section

frankly, based on pure logic, i would not expect this statement to use the default_charset

htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, '');

since '' has no meaning as a value for charset, htmlentites should use the same default as when default_charset='' which is ISO-8859-1

on the other hand, this statement should use the default_charset

htmlentities("\x82\x86\x99\x9f", ENT_QUOTES);

since the third parameter is optional.

jim
 [2008-11-04 12:37 UTC] moriyoshi@php.net
Regarding the input as encoded in default_charset can be over-assumption because the output being sent to the browser may differ from that of the PHP script.

 [2008-11-05 05:52 UTC] hostmaster at uuism dot net
moriyoshi,

sorry, i don't understand what you are saying.

i just don't see why charset='' should be assumed to be the same as charset=default_charset when executing htmlentities

thanks

jim
 [2008-11-09 16:41 UTC] moriyoshi@php.net
The above comment was intended as a reply to the following:
> on the other hand, this statement should use the default_charset

 [2008-11-09 22:45 UTC] hostmaster at uuism dot net
moriyoshi,

Okay.  I understand. 

Isn't it also an over-assumption to assume that the third argument charset equals default_charset in this statement?

htmlentities("\x82\x86\x99\x9f", ENT_QUOTES, '');

Jim
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Thu Sep 29 04:05:53 2022 UTC