php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46335 DOMText::splitText doesn't handle multibyte characters
Submitted: 2008-10-17 17:10 UTC Modified: 2008-10-20 12:47 UTC
From: sites at hubmed dot org Assigned: rrichards (profile)
Status: Closed Package: DOM XML related
PHP Version: 5.2.6 OS: OS X
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: sites at hubmed dot org
New email:
PHP Version: OS:

 

 [2008-10-17 17:10 UTC] sites at hubmed dot org
Description:
------------
Using the DOMText function splitText() on a text node containing multibyte characters results in the node being split at the wrong position.

Reproduce code:
---------------
$text = 'This is an ?example? of using DOM splitText';
$start = 30;
$length = 3;

$dom = new DOMDocument('1.0', 'UTF-8');
$node = $dom->createTextNode($text);
$dom->appendChild($node);

print "Text: $node->textContent\n";

print 'Expected (mb_substr): ' . mb_substr($text, $start, $length, 'UTF-8') . "\n";

$matched = $node->splitText($start);
$matched->splitText($length);
print "Actual (splitText): $matched->textContent\n";

Expected result:
----------------
Text: This is an ?example? of using DOM splitText
Expected (mb_substr): DOM
Actual (splitText): DOM


Actual result:
--------------
Text: This is an ?example? of using DOM splitText
Expected (mb_substr): DOM
Actual (splitText): ing


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-10-17 18:18 UTC] tularis@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

in PHP < 6, this is expected behaviour. PHP < 6 just isn't unicode-aware by default. It should work in PHP 6 however. If not, let us know.
 [2008-10-17 19:22 UTC] rrichards@php.net
assigning to self. This is a valid bug.
$node->substringData($start , $length) as a workaround for now.
All of the DOMCharacterData methods are UTF-8 safe. 
 [2008-10-20 12:47 UTC] rrichards@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 12 22:01:39 2024 UTC