|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2007-02-08 00:04 UTC] jfrim at idirect dot com
Description:
------------
The PERL-compatible regular expression engine is unable to output NULL characters correctly. This is evident with the preg_replace() function (tested), and seems likely evident with other PCRE functions (untested) according to some other but reports already submitted. Instead of returning a NULL character, a literal '\0' sequence is returned.
Reproduce code:
---------------
<?php
$inputstring = "ASCII NUL\0, SOH\01, STX\02, ETX\03";
echo preg_replace('/([\\x00-\\x02])/e',"'['.ord('\\1').']'",$inputstring);
?>
Expected result:
----------------
ASCII NUL[0], SOH[1], STX[2], ETX
(Note that "ETX" is immediately followed by ctrl char #3)
Actual result:
--------------
ASCII NUL[92], SOH[1], STX[2], ETX
(Note that "ETX" is immediately followed by ctrl char #3)
The "92" is present in place of what should be "0" because preg_replace() incorrectly returns a literal '\0' sequence instead of a NULL character, and the ord() function then returns the value of the literal backslash.
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Mon Oct 27 10:00:01 2025 UTC |
The following code demonstrates 0x00 and 0x22 being escaped, without 0x5C being escaped. It creates an 8-bit ASCII text output, with the character value (in DECIMAL) enclosed within braces (except for escaped chars, in which case it ends up as "92"), followed by the actual character, then a CRLF, for all 256 characters. Note how the backslash (0x5C, decimal 92) is NOT escaped, and contrary to what nlopess@php.net posted, the single-quote (0x27, decimal 39) is NOT escaped either. (The double-quote (0x22, decimal 34) is escaped instead.) <?php header('Content-Type: text/plain; charset=US-ASCII'); header('Content-Disposition: inline; filename=PCRE.txt'); header('Pragma: no-cache'); header('Expires: 0'); header('Cache-Control: no-cache; must-revalidate'); $teststring=''; for ($i=0; $i<=255; $i++) { $teststring.=chr($i); } echo preg_replace('/([\\x00-\\xFF])/e',"'{'.ord('\\1').'}\\1'.chr(13).chr(10)",$teststring); ?>The code from my [2007-02-08 19:59:04] post shows only 0x00 and 0x22 being escaped. Maybe single-quote (0x27) only gets escaped depending on the PHP.INI settings? I may check into this later. If nothing is changed in the PHP code, the best work-around I could come up with is this: Always use the "e" modifier, and in the replacement string for preg_replace(), surround the back-reference with a pair of str_replace(), one to handle 0x00 and one for 0x22. Example: <?php echo preg_replace('/([\\x00-\\xFF])/e',"'0x'.sprintf('%02X',ord(str_replace('\\\\0',\"\\0\",str_replace('\\\\\"','\"','\\1'))))",$inputstring); ?> This example takes a string, and turn each byte into "0x" followed by the two digit hex code. Note the first str_replace() turns the \0 into a proper NULL, and the second nested str_replace() turns the \" into just " . It's a very dirty work-around, because preg_replace() is useless without the "e" modifier (adds processing overhead), and str_replace() has to be called twice (adds processing overhead again), and the number of backslashes in the source code is tremendous and can get confusing! And we still have a potential problem remaining. If just which characters are escaped and which ones aren't is dependant on the PHP.INI settings (ie. regarding double-quote and single-quote), then it's impossible for this dirty work-around to be portable, unless the entire thing is encapsulated in an if() or switch() block. That's REALLY dirty! The reason why stripslashes() can't be used on the back-reference is because the backslash character, if matched in the pattern, is NOT escaped when returned in the back-reference! stripslashes() ends up returning a null when only a single backslash is passed to it. If we can't change the behaviour of preg_replace() without breaking compatibility, then I suggest introducing a new function called something like preg_replace_ex() or preg_replace_binsafe() or something, which fixes the bug properly. The ideal bug fix would be for the back-reference to never escape any returned characters, since the input string fed to preg_replace() is NOT in an escaped context, and you should not mix escaped data with unescaped data.