php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #65072 strlen result wrong after passing through html_entity_decode
Submitted: 2013-06-20 16:47 UTC Modified: 2013-06-20 16:55 UTC
From: kstirn at gmail dot com Assigned:
Status: Not a bug Package: Unknown/Other Function
PHP Version: 5.4.16 OS: Windows 7
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: kstirn at gmail dot com
New email:
PHP Version: OS:

 

 [2013-06-20 16:47 UTC] kstirn at gmail dot com
Description:
------------
If you run string "X X" through html_entity_decode then check its' length it returns 4 instead of 3.

PHP 5.3.x gives correct result.

PHP 5.4.x and 5.5.x give incorrect result.

Possibly the same problem with other HTML entities, such as » etc...

Test script:
---------------
<?php
echo 'strlen(a) = ' .strlen("X X") . "<br>\n";
echo 'strlen(b) = ' .strlen(html_entity_decode('X&nbsp;X')) . "<br>\n";
?>

Expected result:
----------------
strlen(a) = 3
strlen(b) = 3

Actual result:
--------------
strlen(a) = 3
strlen(b) = 4

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-06-20 16:55 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2013-06-20 16:55 UTC] nikic@php.net
html_entity_decode('&nbsp;') returns a U+00A0 NO-BREAK SPACE, which encoded in UTF8 is "\xc2\xa0". strlen() returns the length in bytes (not code points), as such strlen("\xc2\xa0") will be 2, not 1.

If you want to have the output in a different encoding (e.g. ISO-8859-1) you can specify it via the last argument to the function.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue May 07 19:01:29 2024 UTC