|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #24934 create_text_node() does not handle special characters
Submitted: 2003-08-04 08:47 UTC Modified: 2003-08-11 05:40 UTC
From: tony at marston-home dot demon dot co dot uk Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 4.3.2 OS: Windows XP
Private report: No CVE-ID: None
 [2003-08-04 08:47 UTC] tony at marston-home dot demon dot co dot uk
I have a field with the value "Cote d'Ivoire" (where the letter 'o' is actually 'o circumflex') which is not being deal with correctly by $doc->create_text_node().

If I pass the text through htmlentities() beforehand what appears in the XML output is "Côte d'Ivoire" instead of "Côte d'Ivoire".

If I do not use htmlentities() on the value the output is "C&#x134960d'Ivoire" (which is totally corrupt) instead of "Côte d'Ivoire" (which is what I expect).

A similar fault exists with all the other special charcters I have tried, such as 'c cedila' etc.

Expected result:
If my input is "Co(circumflex)te d'Ivoire" I expect the output to be "Côte d'Ivoire"

Actual result:
Instead of "Côte d'Ivoire" I am getting "C&#x134960d'Ivoire"


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2003-08-08 17:45 UTC]
Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at

If you can provide more information, feel free to add it
to this bug and change the status back to "Open".

Thank you for your interest in PHP.

can you provide a small script as I cant reproduce the corruption you are getting
 [2003-08-11 04:45 UTC] tony at marston-home dot demon dot co dot uk
Here is a test script which demonstrates this bug:

// test script for bug #24934

// set up $fieldarray with test values
$fieldarray['field1'] = "C?te d'Ivoire";
$fieldarray['field2'] = "Cura?ao";
$fieldarray['field3'] = "Lis?rgida";

// create a new XML document
$doc = domxml_new_doc('1.0');
// create root node
$root = $doc->create_element('root');
$root = $doc->append_child($root);
// create record node
$occ = $doc->create_element('record');
$occ = $root->append_child($occ);
// insert each field as a child node
foreach ($fieldarray as $fieldname => $fieldvalue) {
	$child = $doc->create_element($fieldname);
	$child = $occ->append_child($child);
	$value = $doc->create_text_node($fieldvalue);
	$value = $child->append_child($value);
} // foreach

// get completed xml document
$xml_string = $doc->dump_mem(true);

// dump to a disk file with '.xml' extension
$fname = basename($_SERVER['PHP_SELF']) .'.xml' ;
$fp = fopen($fname, 'w');
$result = fwrite($fp, $xml_string);



If you try to load the XML file into your browser you will see that it contains corrupt characters. For example where is should have '&#xF4;' for ? (letter o with circumflex) it has '&#x134960' instead. Basically the $doc->create_text_node() method is not translating special characters into the right hex code according to the HTML specification.
 [2003-08-11 05:40 UTC]
UTF-8 is the internal encoding for libxml when storing the document. You need to convert non UTF-8 encoded strings into UTF-8 when setting content. Also, for output to a browser you should be using html_dump_mem.

$value = $doc->create_text_node(mb_convert_encoding($fieldvalue,"UTF-8", "ISO-8859-1"));
and use either:
$xml_string = $doc->dump_mem(true, "ISO-8859-1");
$xml_string = $doc->html_dump_mem();
to output the document correctly
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Tue May 11 04:01:23 2021 UTC