php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #42396 Followup to #36711: __halt_compiler() and unicode detection
Submitted: 2007-08-23 12:16 UTC Modified: 2010-11-24 07:09 UTC
Votes:2
Avg. Score:5.0 ± 0.0
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:0 (0.0%)
From: francois at tekwire dot net Assigned: hirokawa (profile)
Status: Closed Package: *General Issues
PHP Version: 5.2.3 OS: all
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: francois at tekwire dot net
New email:
PHP Version: OS:

 

 [2007-08-23 12:16 UTC] francois at tekwire dot net
Description:
------------
Reopening bug #36711 because it is NOT a documentation problem. Setting 'detect_unicode=Off' is NOT a solution, just a workaround.

In practice, because of this bug, PHK or PHAR packages cannot run on zend-multibyte-enabled environments, unless detect_unicode is turned off. Which makes them unusable in environments running unicode-encoded scripts. As a side effect, it also makes it impossible to include an unicode-encoded script inside a PHAR/PHK package, as it cannot be run.

There is no logical reason to bind the __halt_compiler() feature with the zend-multibyte unicode detection capability. Everything after an __halt_compiler() directive must be considered as binary data and should not be scanned for unicode detection. If this data contains a unicode script, it will be scanned and detected when include()d through the stream wrapper.

My (humble) suggestions to fix the problem:

In zend_multibyte_detect_unicode(), the BOM detection does not have to be modified but, then, the script is scanned for null bytes :

return zend_multibyte_detect_utf_encoding(LANG_SCNG(script_org), LANG_SCNG(script_org_size) TSRMLS_CC);

There, the size should not be LANG_SCNG(script_org_size), but the offset of the __halt_compiler() directive. But I don't know where to find the COMPILER_HALT_OFFSET constant for the script. I even suspect it not to be available at this time...

Another way, if the previous one is not possible, would be to scan for a binary string that cannot correspond to any unicode encoding. This way, PHK and PHAR could insert this string after ther __halt_compiler() directive, and it could be detected by zend_multibyte_detect_utf_encoding() as a stop string. I am ready to implement it if somebody provides a sequence of bytes that cannot be found in any unicode-encoded document.

Reproduce code:
---------------
<?php
echo "OK\n";
__halt_compiler();<null-byte>

Expected result:
----------------
OK

Actual result:
--------------
??????????????????

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-08-23 14:08 UTC] jani@php.net
Reclassified: There is no unicode in PHP 5. Just mbstring.
 [2007-08-23 16:24 UTC] francois at tekwire dot net
Not sure it should be reclassified as mbstring related, as the bug is in Zend/zend_multibyte.c and has nothing to do with mbstring.

PHP5 has a little unicode part in the engine. It even has an (undocumented) 'detect_unicode' option.
 [2007-08-24 10:29 UTC] jani@php.net
The same folks who maintain mbstring have added that support so it's not so wrong choice. Reclassified though. And assigned to the maintainer.
 [2007-08-24 10:30 UTC] jani@php.net
Patch posted to internals: http://news.php.net/php.internals/31870

 [2007-08-27 08:38 UTC] jani@php.net
IMHO, #42396 is not a bug, but it is the specification.
The normal script doesn't contain a null byte if it is not encoded in Unicode.

It is understandable the addition of a unique byte seqence
'0xFFFFFFFF' detection to support PHAR/PHK, 
but it is a change to add a new feature.

Rui

 [2010-06-25 10:11 UTC] phofstetter at sensational dot ch
somehow, recently, the default value of detect_unicode seems to have changed. 

With detect_unicode enabled, it's impossible to run any PHAR-file - neither 
through the CLI or through the web server. IMHO, this should really be looked 
into.
 [2010-11-18 09:50 UTC] cataphract@php.net
-Package: Feature/Change Request +Package: *General Issues
 [2010-11-18 09:50 UTC] cataphract@php.net
This bug describes more accurately the problem I attempted to solve with the patch for bug #53199.
 [2010-11-24 07:09 UTC] dmitry@php.net
-Status: Assigned +Status: Closed
 [2010-11-24 07:09 UTC] dmitry@php.net
This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 [2010-11-24 07:09 UTC] dmitry@php.net
It's fixed only in trunk.
 [2010-11-30 06:36 UTC] php at group dot apple dot com
Please see the related bug:
  http://bugs.php.net/bug.php?id=53199

The original patch contained a flaw.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 28 07:01:29 2024 UTC