PHP :: Sec Bug #49687 :: utf8_decode xml_utf8

utf8_decode xml_utf8_decode vuln

Submitted:

2009-09-27 11:20 UTC

Modified:

2010-12-01 16:53 UTC

Votes:	7
Avg. Score:	4.4 ± 0.9
Reproduced:	6 of 6 (100.0%)
Same Version:	3 (50.0%)
Same OS:	2 (33.3%)

From:

sird at rckc dot at

Assigned:

cataphract (profile)

Status:

Closed

Package:

*Unicode Issues

PHP Version:

5.2.11

OS:

Private report:

CVE-ID:

2010-3870

View Developer Edit

[2009-09-27 11:20 UTC] sird at rckc dot at

Description:
------------
Taken from: http://bugs.php.net/bug.php?id=48230
> 
> Description:
> ------------
> xml_utf8_decode function incorrectly decode.
> 
> Reproduce code:
> ---------------
> <?php
> $ill=chr(0xf0).chr(0xc0).chr(0xc0).chr(0xa7);
> $ill=addslashes($ill);
> echo utf8_decode("$ill");
> echo htmlspecialchars ($ill,ENT_QUOTES,"utf-8" );
> ?>
> 
> Expected result:
> ----------------
> it will output a "'" incorrectly.
> 
> Actual result:
> --------------
> it will output a "'" incorrectly.


This is actually a PHP security vulnerability.

Timeline:
* Reported by root@80sec.com: May 11
* Discovered by webmaster@lapstore.de: June 19
* Discovered by Giorgio Maone / Eduardo Vela: July 14
* Reported and Fixed on PHPIDS: July 14
* Microsoft notified of a XSS Filter bypass: July 14
* Fixed XSS Filter bypass on NoScript 1.9.6:  July 20
* Vulnerability disclosed on BlackHat USA 2009: July 29
* Added signature to Acunetix WVS: August 14

References:
* http://www.blackhat.com/presentations/bh-usa-09/VELANAVA/BHUSA09-VelaNava-FavoriteXSS-SLIDES.pdf
* http://www.acunetix.com/blog/web-security-articles/security-risks-associated-with-utf8_decode/
* http://us2.php.net/manual/en/function.utf8-decode.php#83935
* http://bugs.php.net/bug.php?id=48230
* http://noscript.net/changelog

Read the references for further details.

Reproduce code:
---------------
<?php
echo utf8_decode(addslashes("\xf0\xc0\xc0\xa7 or 1=1-- -"));
// more code in references, check the slides and the acunetix blog
?>

Expected result:
----------------
? or 1=1-- -

Actual result:
--------------
' or 1=1--

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2009-09-28 19:38 UTC] sjoerd@php.net

Is this a bug in PHP or in scripts which do utf8_decode(addslashes()) instead of addslashes(utf8_decode())? What do you propose to solve this bug?

[2009-09-29 01:58 UTC] sird at rckc dot at

it is a PHP bug, the function is not decoding correctly, check the ppt and the acunetix blog for details.

there are several bugs in the code, one of them is that a variable holding the value of the char is overflowed (trying to put 21 bits in a 16 bits int), also the code is not checking if it is a valid unicode char (reading unicode specification should explain it).

the example root@80sec gave you was an overlong utf representation of a single quote. that is forbidden by unicode, and should transform the char to ?.

also, the code is not checking if the chars are valid UTF, so stuff like: <img alt="\x90" title=" src=x:x onerror=alert(1)//"> are going to be transformed to <img alt="? title=" src=x:x onerror=alert(1)//">


this is a very serious vulnerability and there are several bugs in the same function (there's even unreachable code).

you can check the implementation of utf by Mozilla or Webkit, they do it right. dont use java as a reference since they are also flawed.

due to the fact that PHP is for web applications and utf is widely used, and it allows an attacker to do all type of attacks (from sql injection to xss) its imperative to fix that function.

Greetings!!

[2009-09-29 04:56 UTC] rasmus@php.net

> there are several bugs in the code, one of them is that a variable
holding the value of the char is overflowed (trying to put 21 bits in a
16 bits int)

That was fixed in 5.2.11

[2009-09-29 05:29 UTC] sird at rckc dot at

the rest is still dangerous.. eating chars without the 10xx xxxx is against the spec, and overlong UTF.

[2009-10-16 01:36 UTC] sird at rckc dot at

: rasmus@php.net

It has come to my attention that this hasn't been fixed..

unsigned int has a size of 16 bits, don't take my word for it

http://www.acm.uiuc.edu/webmonkeys/book/c_guide/1.2.html

Section: 1.2.2 Variables

unsigned int  	16 bits

I just downloaded PHP 5.2.11, and I quote the code:


//  php-5.2.11.tar.bz2/php-5.2.11/ext/xml/xml.c#558
PHPAPI char *xml_utf8_decode(    //  ...
{
	int pos = len;
	char *newbuf = emallo    //  ...
	unsigned int c;          // sizeof(unsigned int)==16 bits
	char (*decoder)(unsig    //  ...
	xml_encoding *enc = x    //  ...
//  ...
//  #580
	c = (unsigned char)(*s);
	if (c >= 0xf0) { /* four bytes encoded, 21 bits */
		if(pos-4 >= 0) {
			c = ((s[0]&7)<<18) | ((s[1]&63)<<12) | ((s[2]&63)<<6) | (s[3]&63);
		} else {
			c = '?';	
		}
		s += 4;
		pos -= 4;
//  ...

Also no checking at ALL is made on the leading bytes (they should be in the form: 10xx xxxx, a check is very easy, to check if s[0] has the correct form: you do an AND with 1100 0000 and then compare it with 1000 0000.

s[0]&0xC0==0x80

Also, Overlong UTF is not being taken care of, that's yeah, yet another vulnerability.

Greetings!!

[2009-10-16 03:32 UTC] scottmac@php.net

On a 16-bit processor an int might be 16-bit, if you can get PHP to compile then well done :-)

Did you even try running the test code?

[2009-10-16 03:41 UTC] sird at rckc dot at

oops!

you are right, :) the code before was unsigned short.

still, the other vulnerabilities remain.

I've made a blogpost that explains the other issues ;)

http://sirdarckcat.blogspot.com/2009/10/couple-of-unicode-issues-on-php-and.html

I updated the post to note the last bug was fixed on 5.2.11

Greetings!!

[2009-10-16 04:01 UTC] scottmac@php.net

PHP 5 has binary strings, not utf-8 strings. It does not attempt to do any validation on input, so expecting addslashes to magically validate things as utf-8 is wrong, simple as.

I agree that utf8_decode should do proper validation here though the overhead of doing that validation is going to be slow. So I've coded up a utf8_validate function. Still need to sort out some of the behaviour first.

[2009-10-16 04:41 UTC] sird at rckc dot at

I disagree.. how slow can it be to add 2 bit operations..

} else if (c < 0x800) {

change to

} else if (c < 0x800) {
    if ( (s[1]&0xC0!=0x80) ){  // this is a new operation
        newbuf[(*newlen)++] = '?'; // this are not new operations
        pos--; // this are not new operations
        s++; // this are not new operations
        continue;
    }
}

Besides, considering all real implementations do what the spec say they should do (it's not validate it's valid UNICODE, is that UNICODE says that the algorithm SHOULD do the check).. not doing it on PHP is just nuts.

[2009-10-16 04:45 UTC] sird at rckc dot at

oh, my mistake:
		else if (c < 0x800) {
			newbuf[(*newlen)++] = (0xc0 | (c >> 6));
			newbuf[(*newlen)++] = (0x80 | (c & 0x3f));
		}

should be:

		else if (c < 0x800) {
                        if ( (s[1]&0xC0!=0x80) ){
                            newbuf[(*newlen)++] = '?';
                        }else{
			    newbuf[(*newlen)++] = (0xc0 | (c >> 6));
			    newbuf[(*newlen)++] = (0x80 | (c & 0x3f));
                        }
		}

[2009-10-16 04:52 UTC] sird at rckc dot at

Oh, duh! I'm reading the wrong function.. :( Sorry

			if(pos-2 >= 0 || s[1]&0xC0!=0x80) {
				c = ((s[0]&7)<<18) | ((s[1]&63)<<12) | ((s[2]&63)<<6) | (s[3]&63);
			} else {
				c = '?';	
			}

[2009-10-16 04:53 UTC] sird at rckc dot at

My last post, I promise..

it should say:
	c = ((s[0]&63)<<6) | (s[1]&63);

Greetz!

[2010-10-27 20:13 UTC] cataphract@php.net

Automatic comment from SVN on behalf of cataphract
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=304959
Log: - Fixed bug #49687 (utf8_decode vulnerabilities and deficiencies in the number
  of reported malformed sequences). (Gustavo)
#Made a public interface for get_next_char/utf-8 in trunk to use in utf8_decode.
#In PHP 5.3, trunk's get_next_char was copied to xml.c because 5.3's
#get_next_char is different and is not prepared to recover appropriately from
#errors.

[2010-10-27 20:13 UTC] cataphract@php.net

-Assigned To: scottmac +Assigned To: cataphract

[2010-10-27 20:13 UTC] cataphract@php.net

Fixed for PHP 5.3 and trunk.

[2010-10-27 20:23 UTC] cataphract@php.net

-Status: Assigned +Status: Closed

[2010-11-02 16:34 UTC] pajoye@php.net

CVE-2010-3870 has been assigned to this issue.

[2010-11-03 15:18 UTC] cataphract@php.net

Automatic comment from SVN on behalf of cataphract
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=305055
Log: - Added fix to bug #49687 to PHP 5.2.

[2010-11-16 21:30 UTC] felipe@php.net

-CVE-ID: +CVE-ID: 2010-3870

[2010-11-22 15:27 UTC] felipe@php.net

-Type: Bug +Type: Security

[2010-12-01 01:56 UTC] leonard-php-bugs at ottolander dot nl

> Fixed for PHP 5.3 and trunk.

How about 5.2? Can you provide a link to the specific commit(s)?

[2010-12-01 15:47 UTC] leonard-php-bugs at ottolander dot nl

Does http://svn.php.net/viewvc/php/php-src/branches/PHP_5_2/ext/xml/xml.c?r1=249522&r2=251671 fix this issue or is http://svn.php.net/viewvc/php/php-src/branches/PHP_5_2/ext/xml/xml.c?r1=293146&r2=305055 required to fix this vulnerability?

[2010-12-01 16:17 UTC] cataphract@php.net

The relevant commit for 5.2 was 305055, as you could see by clicking the "All" tab.

[2010-12-01 16:52 UTC] leonard-php-bugs at ottolander dot nl

Thank you. Will there be an update to 5.2.15 to fix this in the 5.2 branch?

[2010-12-01 16:53 UTC] cataphract@php.net

Yes, PHP 5.2.15 will be released and will mark the end of life of the 5.2 branch.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2026 The PHP Group All rights reserved.	Last updated: Sat Feb 21 05:00:01 2026 UTC