|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46335 DOMText::splitText doesn't handle multibyte characters
Submitted: 2008-10-17 17:10 UTC Modified: 2008-10-20 12:47 UTC
From: sites at hubmed dot org Assigned: rrichards (profile)
Status: Closed Package: DOM XML related
PHP Version: 5.2.6 OS: OS X
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Bug Type:
From: sites at hubmed dot org
New email:
PHP Version: OS:


 [2008-10-17 17:10 UTC] sites at hubmed dot org
Using the DOMText function splitText() on a text node containing multibyte characters results in the node being split at the wrong position.

Reproduce code:
$text = 'This is an ?example? of using DOM splitText';
$start = 30;
$length = 3;

$dom = new DOMDocument('1.0', 'UTF-8');
$node = $dom->createTextNode($text);

print "Text: $node->textContent\n";

print 'Expected (mb_substr): ' . mb_substr($text, $start, $length, 'UTF-8') . "\n";

$matched = $node->splitText($start);
print "Actual (splitText): $matched->textContent\n";

Expected result:
Text: This is an ?example? of using DOM splitText
Expected (mb_substr): DOM
Actual (splitText): DOM

Actual result:
Text: This is an ?example? of using DOM splitText
Expected (mb_substr): DOM
Actual (splitText): ing


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2008-10-17 18:18 UTC]
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at and the instructions on how to report
a bug at

in PHP < 6, this is expected behaviour. PHP < 6 just isn't unicode-aware by default. It should work in PHP 6 however. If not, let us know.
 [2008-10-17 19:22 UTC]
assigning to self. This is a valid bug.
$node->substringData($start , $length) as a workaround for now.
All of the DOMCharacterData methods are UTF-8 safe. 
 [2008-10-20 12:47 UTC]
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
Thank you for the report, and for helping us make PHP better.

PHP Copyright © 2001-2023 The PHP Group
All rights reserved.
Last updated: Fri Dec 08 08:01:27 2023 UTC