|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2011-01-01 16:02 UTC] jani@php.net
-Package: Feature/Change Request
+Package: PCRE related
[2016-07-21 11:34 UTC] cmb@php.net
-Status: Open
+Status: Duplicate
-Assigned To:
+Assigned To: cmb
[2016-07-21 11:34 UTC] cmb@php.net
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Tue Oct 28 15:00:02 2025 UTC |
Description: ------------ This is not a PHP bug, but a suggestion that would help with troubleshooting PCRE calls in one's own PHP scripts. When using the /u modifier in PCRE, if the subject string contains an invalid Unicode sequence, this generates a PREG_BAD_UTF8_ERROR (which can be retrieved using preg_last_error() ). This is expected behavior for PCRE, but it should also emit an E_NOTICE to the user because it could indicate an error in their script (the definition of an E_NOTICE). Specifically, when using preg_replace() in an assignment context (i.e: $subject = preg_replace($foo, $bar, $subject) ), this can create situations where a PREG_BAD_UTF8_ERROR causes the subject string to be "erased" (re-assigned NULL) if the script author didn't take time to ensure that their subject string was valid utf-8 before calling preg_replace(). Even though it's the fault of the script author, the preg_* functions should still at least emit an E_NOTICE about bad UTF-8; it's a pain to hunt through one's proverbial 'miles of code' to figure out why one of their variables suddenly 'disappeared', without a file name or line number to start the troubleshooting by. Workarounds available in the meantime are: // As of PHP 5.3 // (unless the replacement yields string '0') $string = preg_replace(..., $string) ?: $string; // As of PHP 5.3 // Other workaround (any PHP version) $string = is_string($repl=preg_replace(..., $string))? $repl : string; Reproduce code: --------------- --- From manual page: reference.pcre.pattern.modifiers --- error_reporting(-1); // Emit all errors $subject = "fa\xa0ade"; // Valid in ISO-8859-1 (but not UTF-8!) // Causes a PREG_BAD_UTF8_ERROR and sets $subject to NULL. // And we didn't make a copy of the original $subject. Oops! $subject = preg_replace('//u', '', $subject); var_dump($string); // NULL var_dump(preg_last_error()); --- Actual result: -------------- preg_replace() returns NULL; checking preg_last_error() verifies a PREG_BAD_UTF8_ERROR. No errors, warnings, or notices of any kind were generated. We did, however, immediately assign the preg_replace() back to $subject, so $subject is now NULL and has lost whatever data it originally contained. Even though this was obviously our fault, an E_NOTICE would have told us about it.