php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60651 XSD schema validation problem with unicode regular expression
Submitted: 2012-01-04 10:31 UTC Modified: 2020-03-10 08:32 UTC
Votes:10
Avg. Score:4.4 ± 0.7
Reproduced:9 of 9 (100.0%)
Same Version:3 (33.3%)
Same OS:6 (66.7%)
From: webmaster at panservice dot it Assigned: cmb (profile)
Status: Not a bug Package: *XML functions
PHP Version: Irrelevant OS: Linux
Private report: No CVE-ID: None
 [2012-01-04 10:31 UTC] webmaster at panservice dot it
Description:
------------
When I try to validate XML file with given XSD schema containing Unicode regular 
expression, the function DOMDocument::schemaValidate return a validation error.
The XSD schema is W3C well formed and the validation pass with the other 
validation tools.
The problem doesn't occur if the XSD pattern is format like this (without square 
brackets):

<xsd:pattern value="\P{Ll}+"/>

PHP Version: 5.2.14
LibXml Version: 2.7.7

PS: The previous pattern [\P{Ll}]+ works correctly with preg_match function.

Test script:
---------------
PHP Validation Code:

function libxml_display_errors()
{
   $errors = libxml_get_errors();
   
   print_r($errors);

   libxml_clear_errors();
}

libxml_use_internal_errors(true);

$dom = new DOMDocument();
$dom->load('test.xml');

if ( !$dom->schemaValidate('test.xsd') ) {
  echo "XML Error\n";
  libxml_display_errors();
} else {
  echo "XML ok\n";
}


-------------------------------
XSD Schema:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
	<xsd:simpleType name="noLowerCase">
		<xsd:restriction base="xsd:string">
			<xsd:pattern value="[\P{Ll}]+"/>
		</xsd:restriction>
	</xsd:simpleType>
	<xsd:complexType name="DatiUtenteType">
		<xsd:sequence>
			<xsd:element name="Cognome" type="noLowerCase"/>
			<xsd:element name="Nome" type="noLowerCase"/>
		</xsd:sequence>
	</xsd:complexType>
	<xsd:complexType name="DataExchangeFisso">
		<xsd:sequence>
			<xsd:element name="DatiUtente" type="DatiUtenteType"/>
		</xsd:sequence>
	</xsd:complexType>
	<xsd:element name="ListOfDataExchange">
		<xsd:complexType>
			<xsd:sequence>
				<xsd:element name="DataExchangeFisso" type="DataExchangeFisso" minOccurs="0" maxOccurs="unbounded"/>
			</xsd:sequence>
		</xsd:complexType>
	</xsd:element>
</xsd:schema>


-------------------------------
XML File:

<?xml version="1.0" encoding="UTF-8"?>
<ListOfDataExchange>
  <DataExchangeFisso>
    <DatiUtente>
      <Cognome>FOO</Cognome>
      <Nome>BAR</Nome>
    </DatiUtente>
  </DataExchangeFisso>
</ListOfDataExchange>



Expected result:
----------------
XML ok

Actual result:
--------------
XML Error
Array
(
    [0] => LibXMLError Object
        (
            [level] => 2
            [code] => 1839
            [column] => 0
            [message] => Element 'Cognome': [facet 'pattern'] The value 'FOO' is 
not accepted by the pattern '[\P{Ll}]+'.

            [file] => /var/www/html/test.xml
            [line] => 5
        )

    [1] => LibXMLError Object
        (
            [level] => 2
            [code] => 1824
            [column] => 0
            [message] => Element 'Cognome': 'FOO' is not a valid value of the 
atomic type 'noLowerCase'.

            [file] => /var/www/html/test.xml
            [line] => 5
        )

    [2] => LibXMLError Object
        (
            [level] => 2
            [code] => 1839
            [column] => 0
            [message] => Element 'Nome': [facet 'pattern'] The value 'BAR' is 
not accepted by the pattern '[\P{Ll}]+'.

            [file] => /var/www/html/test.xml
            [line] => 6
        )

    [3] => LibXMLError Object
        (
            [level] => 2
            [code] => 1824
            [column] => 0
            [message] => Element 'Nome': 'BAR' is not a valid value of the 
atomic type 'noLowerCase'.

            [file] => /var/www/html/test.xml
            [line] => 6
        )

)


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-03-10 08:32 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2020-03-10 08:32 UTC] cmb@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

\P{Ll} inside a character class is not recognized as Unicode
character property escape sequence.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 05 11:01:29 2024 UTC