php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #24934 create_text_node() does not handle special characters
Submitted: 2003-08-04 08:47 UTC Modified: 2003-08-11 05:40 UTC
From: tony at marston-home dot demon dot co dot uk Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 4.3.2 OS: Windows XP
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: tony at marston-home dot demon dot co dot uk
New email:
PHP Version: OS:

 

 [2003-08-04 08:47 UTC] tony at marston-home dot demon dot co dot uk
Description:
------------
I have a field with the value "Cote d'Ivoire" (where the letter 'o' is actually 'o circumflex') which is not being deal with correctly by $doc->create_text_node().

If I pass the text through htmlentities() beforehand what appears in the XML output is "Côte d'Ivoire" instead of "Côte d'Ivoire".

If I do not use htmlentities() on the value the output is "C&#x134960d'Ivoire" (which is totally corrupt) instead of "Côte d'Ivoire" (which is what I expect).

A similar fault exists with all the other special charcters I have tried, such as 'c cedila' etc.

Expected result:
----------------
If my input is "Co(circumflex)te d'Ivoire" I expect the output to be "Côte d'Ivoire"

Actual result:
--------------
Instead of "Côte d'Ivoire" I am getting "C&#x134960d'Ivoire"

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2003-08-08 17:45 UTC] rrichards@php.net
Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.php.net/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to "Open".

Thank you for your interest in PHP.


can you provide a small script as I cant reproduce the corruption you are getting
 [2003-08-11 04:45 UTC] tony at marston-home dot demon dot co dot uk
Here is a test script which demonstrates this bug:

<?php
//*****************************************************************************
// test script for bug #24934
//*****************************************************************************

// set up $fieldarray with test values
$fieldarray['field1'] = "C?te d'Ivoire";
$fieldarray['field2'] = "Cura?ao";
$fieldarray['field3'] = "Lis?rgida";

// create a new XML document
$doc = domxml_new_doc('1.0');
	
// create root node
$root = $doc->create_element('root');
$root = $doc->append_child($root);
	
// create record node
$occ = $doc->create_element('record');
$occ = $root->append_child($occ);
	
// insert each field as a child node
foreach ($fieldarray as $fieldname => $fieldvalue) {
	$child = $doc->create_element($fieldname);
	$child = $occ->append_child($child);
	$value = $doc->create_text_node($fieldvalue);
	$value = $child->append_child($value);
} // foreach
	
unset($fieldarray);

// get completed xml document
$xml_string = $doc->dump_mem(true);
unset($xml_doc);

// dump to a disk file with '.xml' extension
$fname = basename($_SERVER['PHP_SELF']) .'.xml' ;
$fp = fopen($fname, 'w');
$result = fwrite($fp, $xml_string);
fclose($fp);

exit;

?>

If you try to load the XML file into your browser you will see that it contains corrupt characters. For example where is should have '&#xF4;' for ? (letter o with circumflex) it has '&#x134960' instead. Basically the $doc->create_text_node() method is not translating special characters into the right hex code according to the HTML specification.
 [2003-08-11 05:40 UTC] rrichards@php.net
UTF-8 is the internal encoding for libxml when storing the document. You need to convert non UTF-8 encoded strings into UTF-8 when setting content. Also, for output to a browser you should be using html_dump_mem.

i.e.:
$value = $doc->create_text_node(mb_convert_encoding($fieldvalue,"UTF-8", "ISO-8859-1"));
and use either:
$xml_string = $doc->dump_mem(true, "ISO-8859-1");
or
$xml_string = $doc->html_dump_mem();
to output the document correctly
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 18:01:28 2024 UTC