php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #9365 Problem with muiti-byte char code set (serious)
Submitted: 2001-02-21 04:07 UTC Modified: 2001-04-20 09:28 UTC
From: yohgaki at dd dot iij4u dot or dot jp Assigned:
Status: Closed Package: Scripting Engine problem
PHP Version: 4.0.4pl1 OS: RedHat Linux7.0.1/ja
Private report: No CVE-ID: None
 [2001-02-21 04:07 UTC] yohgaki at dd dot iij4u dot or dot jp
PHP4.0.4pl1 possibly has unsafe code for 8 bit char codesets. If it is the case, any user, that uses charactor code from 128 to 255, may experience strange/unexpected PHP behavior. (Another possiblity is bugs in glibc....)

NOTE: It is very difficult to determine in what condtion program does wrong. When condition meets PHP does following behavior ALWAYS. (I don't figure out exact condition yet. i.e. what combination/location of multi-byte charset causes this behavior.) In most cases, I don't have this kind of problem at all. Therefore, I can't reproduce this problem with simple script, so I don't put them in here.

Anyway, it seems PHP4.0.4pl1 does this:

PHP4 behavior: Script is executed TWICE and included file is not processed
1) PHP parse script and start executing.
   - My script check username data in db, if there is the same username, return error. If not, insert new username into db.
2) PHP calls function to register new user.
2) PHP execute code to insert data into db in the function. if user can be added. PHP possibly encounters 8bit char unclean code some where near include()and RESTART script execution from the beginning.
   - The script written to include() HTML file for successful user registration.

PHP inserts new username into db at 1st execution, then it finds the same username in db and return error for 2nd execution. 
If I put die('died here') BEFORE include(), PHP stops execution and outputs 'died here'. but not AFTER include(). PHP does not stop execution inside of included file, too. 
I was using 'ob_gzhandler', disabling it does not make any difference.

This happened when user registration check/insert was done in function defined in other included file that included at the top of script. 
PHP does not log any errors when this happens. (E_ALL)

PHP4 behavior: Script does not process included file and outputs default HTML as if I didn't print any outputs.
(It is rewrite for the code I explained)
1) PHP parse script and start executing.
 - This script does not use function calls in contrast to previous one.
2) PHP possibly encounters 8bit char unclean code some where near include(), and outputs default HTML for null output and stops execution.

Therefore, I can see output from die('died here') if I put BEFORE include(), but not AFTER include(). If I put die('died here') inside of included file, PHP does not die also. 

This happened when user registration check/insert was done in the script w/o using functions. i.e. I'm not using functions defined included file. The script logic is identical to first one except it is not using any functions. 
PHP does not log any errors. (E_ALL)

When I tested with PLAIN ASCII HTML for included file. PHP WORKS as expected. i.e. It show html file, and die/exit from script. (before/inside/after include())

I use EUC (Extended Unix Code), EUC-JP to be specific,  for char code, which is supposed to work well with 8 bit char code clean programs.

[Environment]
OS: RadHat Linux7.0.1/ja(i386) FTP version (no glibc update)
Apache: Apache 1.3.17 w/ mod_ssl-2.8.0, mod_gzip-1.13.17a. build from source 
PHP: PHP4.0.4pl1 w/ pgsql-7.0.3, gd-1.8.3, mhash, mcript and others. build from source. (no debug option)
 - ECU-JP for all html,  php scripts
PHP Configure:
'./configure' '--with-apxs' '--disable-short-tags' '--enable-bcmath' '--with-zlib-dir' '--enable-ftp' '--with-imap' '--with-mhash' '--with-mcrypt' '--with-pgsql' '--with-swf' '--enable-sysvsem' '--enable-sysvshm' '--with-zlib' '--enable-iconv' '--with-kakasi' '--enable-jstring' '--enable-mbregex' '--with-namazu' '--with-gd=../gd-1.8.3/' '--with-jpeg-dir=/usr' '--with-xpm-dir=/usr/X11R6'

I cannot think of any reasonable explanation for this strange PHP4 behavior other than possibility that glibc has bugs. (8 bit char unsafe code, etc. I haven't research about my exact glibc version nor bugs yet, so far I don't have any problem other than PHP4.)

PS: I don't use EUC for var/function names, of course. I only use EUC in HTML or var contents. 
I really want this problem to be fixed. If you need to contact me, please do so. I'll try the best I can do.

Regards,
--
Yasuo Ohgaki

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2001-03-08 05:41 UTC] stas@php.net
Could you please provide a short code demonstrating the problem?
 [2001-03-08 07:11 UTC] yohgaki at dd dot iij4u dot or dot jp
I thought I put some code, but there is not....
Anyway, I found the line causes "goto" like behaviour. It was the line to include HTML file to show users.
(Note: I still have code that alway do that, I tried to make it simple. So far no luck, if I make it simple, it starts working as expected.... I will try again after I upgrade my glibc to see if it fixes the problem)

I've tried to stop script execution as follows.

die('Die before include'); // Works as expected
include('some.html');
die('Die after include); // This will never happen

Inside some.html
<?php
die('Die inside include'); // This will never happen
?>
at the beginning of the file.

My code has many include/require like this. This only happens on the script, but not others. Other scripts work fine. All I can tell is something happens when the file is included. Although, the file can be included w/o problems from other scripts.

I'll post follow up. Because I didn't test the script after I upgraded Japanese Charactor handling module. It might be gone. (I hope)
 [2001-03-09 22:52 UTC] yohgaki at dd dot iij4u dot or dot jp
I tested with 

Newer Japanese Charset Handling module.
=> the same result.

Without these modules
=> the same result.

I was compiled in these Japanese char handling modules in php. I didn't compile these modules in php, but I compile these modules as individual *.so file. The same result.

If I use plain ASCII HTML file for include(). It works. But not with HTML contains EUC.
I'll upgrade my glibc see if it fixes. (Please wait for feedback)

FYI:
Code that causes this. I've tested with require()/include_once()/require_once(), the same result.
Reminder - Include()/require() works fine except on this file.

------------------
// Show Registration complete html
//die('DIED BEFORE INCLUDE'); // Dies as it should
//header('Location: http://www/'); // Just for testing
//include('regist_finished.ihtml'); // HTML file contains EUC. Script executed again from the beginning!! Can't even die at the beginning of the file.
//include('test3.php'); // ASCII char only. Works as expected.
include('cancel.ihtml'); // Another HTML file contains EUC. Script executed again from the beginning!! Can't even die at the beginning of the file.
die('DIED AFTER INCLUDE'); // DO NOT DIE, as it should.
--------------------


 [2001-03-09 23:03 UTC] yohgaki at dd dot iij4u dot or dot jp
I tested with 

Newer Japanese Charset Handling module.
=> the same result.

Without these modules
=> the same result.

I was compiled in these Japanese char handling modules in php. I didn't compile these modules in php, but I compile these modules as individual *.so file. The same result.

If I use plain ASCII HTML file for include(). It works. But not with HTML contains EUC.
I'll upgrade my glibc see if it fixes. (Please wait for feedback)

FYI:
Code that causes this. I've tested with require()/include_once()/require_once(), the same result.
Reminder - Include()/require() works fine except on this file.

------------------
// Show Registration complete html
//die('DIED BEFORE INCLUDE'); // Dies as it should
//header('Location: http://www/'); // Just for testing
//include('regist_finished.ihtml'); // HTML file contains EUC. Script executed again from the beginning!! Can't even die at the beginning of the file.
//include('test3.php'); // ASCII char only. Works as expected.
include('cancel.ihtml'); // Another HTML file contains EUC. Script executed again from the beginning!! Can't even die at the beginning of the file.
die('DIED AFTER INCLUDE'); // DO NOT DIE, as it should.
--------------------


 [2001-03-09 23:14 UTC] yohgaki at dd dot iij4u dot or dot jp
It seems it is related to reference.

The registration is done in a function and if there are error the function returns array contains error messages. It can be relatively large, so I returned reference. I get rid of the reference, then it start working as it should.

i.e. 
function &register() <= script executed from the beginning.
function register() <= works as it should.

Hope this info helps.

 [2001-03-09 23:34 UTC] yohgaki at dd dot iij4u dot or dot jp
I forgot comment that it does not happen with simple scripts that have the same logic. So trying to make a script that causes this problem is just a waste of your time.

I'll test some more, so please wait feedback.
 [2001-03-10 05:45 UTC] yohgaki at dd dot iij4u dot or dot jp
At first I should correct that included file that causes this problem is only one file and returning reference from the functions does not make difference. (Script was not uploaded to server. I tested again, the file causes this problem only happens when the file is included, and reference does not make changes)

I've tested with 4.0.4pl1 and 4.0.5-dev (200103092045)
with 
 - RedHat Linux7/j (glibc is updated to 2.2, gcc also updated - Redhat's RPM)
 - Apache 1.3.17 w/ mod-ssl, mod-gzip and other apache modules comes with apache. (Tested both w/ and w/o ssl)

For 4.0.5-dev, I only add PostgreSQL module.
'./configure' '--with-apxs' '--disable-short-tags' '--with-pgsql'

Results are the same.

I also included the file that causes this from other script. The file is included w/o any problems.

I tried to make a simple script that reproduce this problem again, but I could not.

Since I can easily workaround with this problem and I've seen this only once so far, I'll leave this problem.
(If I don't use function, the code works as expected)

If you have similar bug report and need more info, please let me know.

PS: Someone post similar problem regarding ob_ob_get_contents(). I asked what kind of environment he uses. I'll post report, if I can find out anything.

 [2001-04-20 09:28 UTC] andi@php.net
Also reported as fixed.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 16:01:27 2024 UTC