go to bug id or search bugs for
Reopening bug #36711 because it is NOT a documentation problem. Setting 'detect_unicode=Off' is NOT a solution, just a workaround.
In practice, because of this bug, PHK or PHAR packages cannot run on zend-multibyte-enabled environments, unless detect_unicode is turned off. Which makes them unusable in environments running unicode-encoded scripts. As a side effect, it also makes it impossible to include an unicode-encoded script inside a PHAR/PHK package, as it cannot be run.
There is no logical reason to bind the __halt_compiler() feature with the zend-multibyte unicode detection capability. Everything after an __halt_compiler() directive must be considered as binary data and should not be scanned for unicode detection. If this data contains a unicode script, it will be scanned and detected when include()d through the stream wrapper.
My (humble) suggestions to fix the problem:
In zend_multibyte_detect_unicode(), the BOM detection does not have to be modified but, then, the script is scanned for null bytes :
return zend_multibyte_detect_utf_encoding(LANG_SCNG(script_org), LANG_SCNG(script_org_size) TSRMLS_CC);
There, the size should not be LANG_SCNG(script_org_size), but the offset of the __halt_compiler() directive. But I don't know where to find the COMPILER_HALT_OFFSET constant for the script. I even suspect it not to be available at this time...
Another way, if the previous one is not possible, would be to scan for a binary string that cannot correspond to any unicode encoding. This way, PHK and PHAR could insert this string after ther __halt_compiler() directive, and it could be detected by zend_multibyte_detect_utf_encoding() as a stop string. I am ready to implement it if somebody provides a sequence of bytes that cannot be found in any unicode-encoded document.
Add a Patch
Add a Pull Request
Reclassified: There is no unicode in PHP 5. Just mbstring.
Not sure it should be reclassified as mbstring related, as the bug is in Zend/zend_multibyte.c and has nothing to do with mbstring.
PHP5 has a little unicode part in the engine. It even has an (undocumented) 'detect_unicode' option.
The same folks who maintain mbstring have added that support so it's not so wrong choice. Reclassified though. And assigned to the maintainer.
Patch posted to internals: http://news.php.net/php.internals/31870
IMHO, #42396 is not a bug, but it is the specification.
The normal script doesn't contain a null byte if it is not encoded in Unicode.
It is understandable the addition of a unique byte seqence
'0xFFFFFFFF' detection to support PHAR/PHK,
but it is a change to add a new feature.
somehow, recently, the default value of detect_unicode seems to have changed.
With detect_unicode enabled, it's impossible to run any PHAR-file - neither
through the CLI or through the web server. IMHO, this should really be looked
This bug describes more accurately the problem I attempted to solve with the patch for bug #53199.
This bug has been fixed in SVN.
Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
Thank you for the report, and for helping us make PHP better.
It's fixed only in trunk.
Please see the related bug:
The original patch contained a flaw.