php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #61980 htmlentities and htmlspecialchars do not work with chinese!
Submitted: 2012-05-09 10:42 UTC Modified: 2012-05-09 14:13 UTC
From: yyb8 at vip dot qq dot com Assigned:
Status: Not a bug Package: Unknown/Other Function
PHP Version: 5.4.3 OS: windows 2003
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: yyb8 at vip dot qq dot com
New email:
PHP Version: OS:

 

 [2012-05-09 10:42 UTC] yyb8 at vip dot qq dot com
Description:
------------
htmlentities and htmlspecialchars do not work with chinese!

$str='<a href="test.html">测试页面</a>';
 
echo htmlentities($str);

print nothing.

it is a bug?

Test script:
---------------
$str='<a href="test.html">测试页面</a>'; 
echo htmlentities($str);



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-05-09 13:39 UTC] info at phpjunkyard dot com
Having the same problem with Windows-1250 chars:

<?php
echo PHP_VERSION . '<br />' . PHP_EOL;
echo 'Testing: ' . htmlentities('ščćžđ ŠČĆŽĐ') . '<br />' . PHP_EOL;
echo 'Testing: ' . htmlspecialchars('ščćžđ ŠČĆŽĐ') . '<br />' . PHP_EOL;
echo 'Testing: ' . substr('ščćžđ ŠČĆŽĐ', 0) . '<br />' . PHP_EOL;
?>

substr works fine, htmlentities and htmlspecialchars don't return anything.

Affected: PHP 5.4.X
Works fine in 5.3.X and previous.

Sincerely,

Klemen
 [2012-05-09 13:58 UTC] rasmus@php.net
Previously htmlspecialchars/htmlentities assumed iso-8859-1, as of PHP 5.4 it 
defaults to UTF-8. You can switch the default back to 8859-1 if you like by 
setting default_charset in your ini file, but you risk not encoding things 
correctly depending on the actual encoding you are feeding the function. Since 
these are security-related functions having 8859-1 as the default wasn't a good 
idea because in that encoding everything is valid. If your encoding doesn't map 
the characters that are special in HTML to the exact same place as iso-8859-1 
then you have a security problem.

So you have two ways to solve it. Change the default in your ini file back to 
5.3 behaviour, or be explicit and pass the encoding as the 3rd param to the 
functions.
 [2012-05-09 13:58 UTC] rasmus@php.net
-Status: Open +Status: Not a bug
 [2012-05-09 14:13 UTC] yyb8 at vip dot qq dot com
Thank you for clarification, should have been more careful when reading the migration guide:
http://php.net/manual/en/migration54.other.php

This works:

<?php echo htmlentities('ščćžđ ŠČĆŽĐ', ENT_COMPAT | ENT_HTML401, 'ISO-8859-1'); ?>

I do expect a lot of broken websites/scripts though when non-utf8 websites start migrating to PHP 5.4.x...

Regards,
Klemen
 [2012-05-09 14:20 UTC] info at phpjunkyard dot com
Actually, Windows-1250 is not supported, so to get it to work correctly I used:

<?php echo htmlentities( iconv('Windows-1250', 'UTF-8', 'ščćžđ ŠČĆŽĐ') ); ?>
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Apr 25 11:01:29 2025 UTC