php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77569 Write Access Violation in DomImplementation
Submitted: 2019-02-05 15:04 UTC Modified: 2020-02-14 08:18 UTC
From: christiaanswiersfuture at gmail dot com Assigned: stas (profile)
Status: Closed Package: DOM XML related
PHP Version: master-Git-2019-02-05 (Git) OS: Both Windows & Linux
Private report: No CVE-ID: None
 [2019-02-05 15:04 UTC] christiaanswiersfuture at gmail dot com
Description:
------------
The input in $dom->encoding is not sanitized and used directly in memory. Any hexadecimal value translates semi directly to a value used in memory. Please see my example below.

Test script:
---------------
<?php // shortest version
$imp = new DOMImplementation;
$dom = $imp->createDocument("", "");
$dom->encoding = null;
 // Null for encding can be a memory address too (-16+0x41424344 writs 0x41424344 to edx for example)
 // -16 can be different on other machines.
?>

Expected result:
----------------
Incorrect encoding warning

No crash

Actual result:
--------------
ModLoad: 77180000 771a5000   C:\WINDOWS\SysWOW64\IMM32.DLL
ModLoad: 72940000 72973000   C:\WINDOWS\SysWOW64\IPHLPAPI.DLL
(2cdc.1f88): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000001 ebx=00a12000 ecx=00000000 edx=41424344 esi=0741c3f0 edi=0741c3e0
eip=56f81aac esp=0741c258 ebp=0741c2e4 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
php7ts_debug!xmlFindCharEncodingHandler+0x3c:
56f81aac 0fbe040a        movsx   eax,byte ptr [edx+ecx]     ds:002b:41424344=??


php7ts_debug!xmlFindCharEncodingHandler+0x3c [e:\repo\winlibs_libxml2\encoding.c @ 1649] 
php7ts_debug!dom_document_encoding_write+0x64 [e:\php-sdk\phpdev\vc14\x86\php-7.1.10-src\ext\dom\document.c @ 344] 
php7ts_debug!dom_write_property+0x7c [e:\php-sdk\phpdev\vc14\x86\php-7.1.10-src\ext\dom\php_dom.c @ 370] 
php7ts_debug!ZEND_ASSIGN_OBJ_SPEC_CV_CONST_OP_DATA_CONST_HANDLER+0x6ab [e:\php-sdk\phpdev\vc14\x86\php-7.1.10-src\zend\zend_vm_execute.h @ 38652] 
php7ts_debug!execute_ex+0x77 [e:\php-sdk\phpdev\vc14\x86\php-7.1.10-src\zend\zend_vm_execute.h @ 432] 
php7ts_debug!zend_execute+0x172 [e:\php-sdk\phpdev\vc14\x86\php-7.1.10-src\zend\zend_vm_execute.h @ 474] 
php7ts_debug!zend_execute_scripts+0xd7 [e:\php-sdk\phpdev\vc14\x86\php-7.1.10-src\zend\zend.c @ 1480] 
php7ts_debug!php_execute_script+0x62a [e:\php-sdk\phpdev\vc14\x86\php-7.1.10-src\main\main.c @ 2552] 
php!do_cli+0xbed [e:\php-sdk\phpdev\vc14\x86\php-7.1.10-src\sapi\cli\php_cli.c @ 993] 
php!main+0x9c9 [e:\php-sdk\phpdev\vc14\x86\php-7.1.10-src\sapi\cli\php_cli.c @ 1381] 
php!invoke_main+0x1e [f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl @ 78] 
php!__scrt_common_main_seh+0x157 [f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 
php!__scrt_common_main+0xd [f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl @ 331] 
php!mainCRTStartup+0x8 [f:\dd\vctools\crt\vcstartup\src\startup\exe_main.cpp @ 17] 
KERNEL32!BaseThreadInitThunk+0x19
ntdll!__RtlUserThreadStart+0x2f
ntdll!_RtlUserThreadStart+0x1b


Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-02-05 15:37 UTC] cswiers+php at wearehackerone dot com
The call stack is generated on my windows machine. This is using an older version of PHP. Here's the output on my Linux machine:

# ./php -v
PHP 8.0.0-dev (cli) (built: Feb  5 2019 11:40:32) ( NTS DEBUG )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v4.0.0-dev, Copyright (c) Zend Technologies
# ./php write_access_violation.php

AddressSanitizer:DEADLYSIGNAL
=================================================================
==2871==ERROR: AddressSanitizer: SEGV on unknown address 0x0000deadbeef (pc 0x7fd17b5eefeb bp 0x61e000001b00 sp 0x7fff2e933210 T0)
==2871==The signal is caused by a READ memory access.
    #0 0x7fd17b5eefea  (/usr/lib/x86_64-linux-gnu/libxml2.so.2+0x32fea)
    #1 0xa17fe6  (/root/Documents/Fuzzing/PHP/install/bin/php+0xa17fe6)
    #2 0xa002c0  (/root/Documents/Fuzzing/PHP/install/bin/php+0xa002c0)
    #3 0x190845a  (/root/Documents/Fuzzing/PHP/install/bin/php+0x190845a)
    #4 0x173e856  (/root/Documents/Fuzzing/PHP/install/bin/php+0x173e856)
    #5 0x173f59d  (/root/Documents/Fuzzing/PHP/install/bin/php+0x173f59d)
    #6 0x15a7a3f  (/root/Documents/Fuzzing/PHP/install/bin/php+0x15a7a3f)
    #7 0x12fc3e4  (/root/Documents/Fuzzing/PHP/install/bin/php+0x12fc3e4)
    #8 0x1ad1954  (/root/Documents/Fuzzing/PHP/install/bin/php+0x1ad1954)
    #9 0x1ace7b9  (/root/Documents/Fuzzing/PHP/install/bin/php+0x1ace7b9)
    #10 0x7fd17b3e409a  (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #11 0x448789  (/root/Documents/Fuzzing/PHP/install/bin/php+0x448789)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/usr/lib/x86_64-linux-gnu/libxml2.so.2+0x32fea) 
==2871==ABORTING
 [2019-02-05 18:39 UTC] stas@php.net
-Status: Open +Status: Feedback
 [2019-02-05 18:39 UTC] stas@php.net
>  Any hexadecimal value translates semi directly to a value used in memory.

Could you explain what you mean by that? Any example?
 [2019-02-05 19:18 UTC] christiaanswiersfuture at gmail dot com
-: cswiers+php at wearehackerone dot com +: christiaanswiersfuture at gmail dot com -Status: Feedback +Status: Open
 [2019-02-05 19:18 UTC] christiaanswiersfuture at gmail dot com
>  Any hexadecimal value translates semi directly to a value used in memory.

> Could you explain what you mean by that? Any example?

As far as I've found out so far any numeric value filled in the encoding property will be written to the EDX register in the CPU. Using the added comments in the test script we can make the following script as well:

<?php // shortest version
$imp = new DOMImplementation;
$dom = $imp->createDocument("", "");
$dom->encoding = -16+0x41424344;

In this case the value -16+0x41424344 will translate to 0x41424344 in the EDX register (on my windows machine). On my linux machine I had to do +18+0x41424344 to get the same result.

What I mean by "Any hexadecimal value translates semi directly to a value used in memory" is that you need to calculate the right value to get where you want to be. So as i said in the example above I use -16+0x41424344, which ends up as 41424344 in the EDX register.

In this case the process crashes on an access violation at that address. This indicates a write access violation since an attacker can control the write address and value.

I hope I made myself more clear. If you need any further information please let me know.
 [2019-02-10 02:15 UTC] stas@php.net
-Status: Open +Status: Verified
 [2019-02-10 02:15 UTC] stas@php.net
The problem seems to be in this code:

   342 		str = zval_get_string(newval);
   343 	
   344 		handler = xmlFindCharEncodingHandler(Z_STRVAL_P(newval));

It converts newval to string, but then for some reason uses un-converted value to call xmlFindCharEncodingHandler. That's definitely wrong. 

Not sure it's a security issue, unless there's a plausible scenario where DOM encoding is being set from non-string values. I can't see any so far.
 [2019-02-15 13:13 UTC] christiaanswiersfuture at gmail dot com
If the value filled in the encoding is used in a call or jmp statement an attacker may fill in a memory value to a part in memory that he controls. This can lead to an arbitrary code execution.

So far I have not been able to proof this possible though in this specific situation.
 [2019-03-02 20:53 UTC] stas@php.net
> If the value filled in the encoding is used in a call or jmp statement an attacker may fill in a memory value to a part in memory that he controls. This can lead to an arbitrary code execution.

If the attacker can run the code to set the encoding, he is already executing arbitrary code. The only scenario where this can be a security issue is if the encoding can be set to a non-string value by a legitimate script not written by the attacker. But I can't see any plausible scenario how it can happen.
 [2019-05-15 11:03 UTC] nikic@php.net
@stas: Any decision on how to classify this? The patch is very simple:

diff --git a/ext/dom/document.c b/ext/dom/document.c
index c9e1802f78..11ef4aa818 100644
--- a/ext/dom/document.c
+++ b/ext/dom/document.c
@@ -341,7 +341,7 @@ int dom_document_encoding_write(dom_object *obj, zval *newval)
 
        str = zval_get_string(newval);
 
-       handler = xmlFindCharEncodingHandler(Z_STRVAL_P(newval));
+       handler = xmlFindCharEncodingHandler(ZSTR_VAL(str));
 
     if (handler != NULL) {
                xmlCharEncCloseFunc(handler);
 [2020-02-12 14:50 UTC] cmb@php.net
-Assigned To: +Assigned To: stas
 [2020-02-12 14:50 UTC] cmb@php.net
> But I can't see any plausible scenario how it can happen.

Me neither.

Anyhow, we need a decision whether this is a security issue or
not.  If it isn't, it's still a bug, and the fix is trivial.  If
it is, we should address it as soon as possible.
 [2020-02-12 21:04 UTC] stas@php.net
-Type: Security +Type: Bug
 [2020-02-12 21:04 UTC] stas@php.net
I'd commit it to 7.2 out of abundance of caution, but I don't think it needs CVE really because I don't see any plausible scenario how it could be exploited.
 [2020-02-13 14:18 UTC] cmb@php.net
Automatic comment on behalf of cmbecker69@gmx.de
Revision: http://git.php.net/?p=php-src.git;a=commit;h=cec8b24c848bab8562c82422f3692c193f0afcdb
Log: Fix #77569: Write Acess Violation in DomImplementation
 [2020-02-13 14:18 UTC] cmb@php.net
-Status: Verified +Status: Closed
 [2020-02-14 08:18 UTC] cmb@php.net
-Summary: Write Acess Violation in DomImplementation +Summary: Write Access Violation in DomImplementation
 [2020-02-20 23:41 UTC] hsamdrhmahmd at gmail dot com
The following pull request has been associated:

Patch Name: Reformated README in Markdown
On GitHub:  https://github.com/php/php-gtk-src/pull/8
Patch:      https://github.com/php/php-gtk-src/pull/8.patch
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 22 19:01:31 2025 UTC