php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46335 DOMText::splitText doesn't handle multibyte characters
Submitted: 2008-10-17 17:10 UTC Modified: 2008-10-20 12:47 UTC
From: sites at hubmed dot org Assigned: rrichards (profile)
Status: Closed Package: DOM XML related
PHP Version: 5.2.6 OS: OS X
Private report: No CVE-ID: None
 [2008-10-17 17:10 UTC] sites at hubmed dot org
Description:
------------
Using the DOMText function splitText() on a text node containing multibyte characters results in the node being split at the wrong position.

Reproduce code:
---------------
$text = 'This is an ?example? of using DOM splitText';
$start = 30;
$length = 3;

$dom = new DOMDocument('1.0', 'UTF-8');
$node = $dom->createTextNode($text);
$dom->appendChild($node);

print "Text: $node->textContent\n";

print 'Expected (mb_substr): ' . mb_substr($text, $start, $length, 'UTF-8') . "\n";

$matched = $node->splitText($start);
$matched->splitText($length);
print "Actual (splitText): $matched->textContent\n";

Expected result:
----------------
Text: This is an ?example? of using DOM splitText
Expected (mb_substr): DOM
Actual (splitText): DOM


Actual result:
--------------
Text: This is an ?example? of using DOM splitText
Expected (mb_substr): DOM
Actual (splitText): ing


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-10-17 18:18 UTC] tularis@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

in PHP < 6, this is expected behaviour. PHP < 6 just isn't unicode-aware by default. It should work in PHP 6 however. If not, let us know.
 [2008-10-17 19:22 UTC] rrichards@php.net
assigning to self. This is a valid bug.
$node->substringData($start , $length) as a workaround for now.
All of the DOMCharacterData methods are UTF-8 safe. 
 [2008-10-20 12:47 UTC] rrichards@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 05:01:29 2024 UTC