php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #51385 htmlentities next substr with UTF-8
Submitted: 2010-03-25 05:18 UTC Modified: 2010-03-25 07:47 UTC
From: baudav at gmail dot com Assigned:
Status: Not a bug Package: *Unicode Issues
PHP Version: 5.3.2 vc9-nts OS: W2k3 IIS6
Private report: No CVE-ID: None
 [2010-03-25 05:18 UTC] baudav at gmail dot com
Description:
------------
substr not truncate UTF-8 correctly, and generate bad UTF-8 string.

test script must be writen in UTF-8

Test script:
---------------
<?php
$str = 'câble TOSLink mâle/mâle (1.5 à 25m)';
$etc = '...';

echo htmlentities(substr($str, 0, 33). $etc, ENT_QUOTES, 'UTF-8')

?>

Expected result:
----------------
câble TOSLink mâle/mâle (1.5 ...

Actual result:
--------------
no return, just PHP error logged: 

PHP Warning:  htmlentities(): Invalid multibyte sequence in argument in C:\DATA\WWW\test.php on line 5

change substr($str, 0, 33) by substr($str, 0, 32), it's work

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-03-25 05:24 UTC] aharvey@php.net
-Status: Open +Status: Bogus
 [2010-03-25 05:24 UTC] aharvey@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Like most PHP functions, substr() is not multibyte-aware. You may prefer to use 
mb_substr() instead.
 [2010-03-25 05:27 UTC] baudav at gmail dot com
-Operating System: Windows 2003 +Operating System: W2k3 IIS6 -PHP Version: 5.3.2 +PHP Version: 5.3.2 vc9-nts
 [2010-03-25 05:27 UTC] baudav at gmail dot com
Windows 2003 with IIS6 fastcgi; PHP 5.3.1 or 5.3.2 vc9-nts
 [2010-03-25 07:47 UTC] baudav at gmail dot com
Oh! excuse for my incomplet report! Tested with substr and mb_substr; It's same with mb_string
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Oct 06 14:01:27 2024 UTC