|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #33198 Leading UTF-8 byte order mark yields different character encodings in output
Submitted: 2005-05-31 09:42 UTC Modified: 2005-11-14 11:06 UTC
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: Bjorn dot Wiberg at its dot uu dot se Assigned: moriyoshi (profile)
Status: Closed Package: Unknown/Other Function
PHP Version: 5CVS-2005-06-21 OS: *
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: Bjorn dot Wiberg at its dot uu dot se
New email:
PHP Version: OS:


 [2005-05-31 09:42 UTC] Bjorn dot Wiberg at its dot uu dot se
When a PHP/PHTML file (a file getting parsed by PHP) contains a leading UTF-8 Byte Order Mark (BOM), EF BB BF in hex, the character encoding of the output varies.

Possibly relevant PHP configuration snippets from httpd.conf:

php_value default_charset none
php_value default_mimetype "text/html"
php_admin_value output_buffering 4096
php_admin_value output_handler none

PHP configuration flags (from config.nice):

CPPFLAGS='-I/usr/local/include' \
LDFLAGS='-L/lib -L/opt/freeware/lib -L/usr/local/lib' \
CC='/usr/local/bin/gcc' \
'./configure' \
'--enable-bcmath' \
'--enable-calendar' \
'--enable-dba' \
'--enable-dbase' \
'--enable-dbx' \
'--enable-debug' \
'--enable-dio' \
'--enable-exif' \
'--enable-embedded-mysqli' \
'--enable-filepro' \
'--enable-ftp' \
'--enable-gd-jis-conv' \
'--enable-gd-native-ttf' \
'--enable-mbstring' \
'--enable-memory-limit' \
'--enable-shmop' \
'--enable-soap' \
'--enable-sockets' \
'--enable-sysvmsg' \
'--enable-sysvsem' \
'--enable-sysvshm' \
'--enable-yp' \
'--enable-zend-multibyte' \
'--prefix=/apache/php' \
'--with-apxs2=/apache/bin/apxs' \
'--with-bz2' \
'--with-freetype-dir' \
'--with-gd' \
'--with-gdbm' \
'--with-gettext' \
'--with-inifile' \
'--with-jpeg-dir' \
'--with-ldap' \
'--with-libxml-dir' \
'--with-mime-magic' \
'--with-mysql=/usr/local/mysql' \
'--with-openssl=/opt/freeware' \
'--with-png-dir' \
'--with-tiff-dir' \
'--with-ttf' \
'--with-xpm-dir' \
'--with-zlib' \
'--with-zlib-dir' \

Reproduce code:
Source code with leading BOM:

Corresponding page yielding varying result:

(Use telnet or other "dumb" tool to dump the results; it appears that the result varies by server process that handles the request. We're using the Apache 2 prefork MPM.)

Example of incorrect result, here we get ISO-8859-1 encoded output:

Expected result:
Example of correct result, here we get UTF-8 encoded output:

Actual result:
Varies between the two examples. Make sure to close the connection in between (or the same server process will serve your multiple requests).

Restarting the web server does not help.

Changing the PHP settings for default character set (default_charset) and default MIME type (default_mimetype) does not help; I have tried all combinations.

Turning off output buffering (php_admin_flag output_buffering off) does not help.


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2005-06-15 21:10 UTC]
And if you change *.php to *.<something_else> it still b0rks?
Why do you think it's PHPs fault? 
From what I can see, there is no PHP functions involved at all.
 [2005-06-16 14:44 UTC] Bjorn dot Wiberg at its dot uu dot se

Please try the following:
...exhibits the bug randomly just like *.php. When in error, the output looks like this:
...does not exhibit the bug; output does not get re-encoded in any way, i.e. UTF-8 characters remain UTF-8:

It appears that something happens to the output (sometimes) when PHP parses the file.

Best regards,
 [2005-06-19 00:50 UTC]
Please try using this CVS snapshot:
For Windows:

 [2005-06-21 17:28 UTC] Bjorn dot Wiberg at its dot uu dot se
Hi again!

I just tried the 2005-06-21 10:30 snapshot, but the problem persists.

Best regards,
 [2005-06-27 00:59 UTC]
Check out this Apache bug:
 [2005-06-27 09:33 UTC] Bjorn dot Wiberg at its dot uu dot se
Hi again!

Sorry, but this is not connected to mod_cgi in any way. We're not using PHP through mod_cgi, but as a handler. Apache does not have a problem serving the page -- instead it seems that the error only occurs when the file is being processed by the PHP handler.

Best regards,
 [2005-11-13 20:39 UTC]
Try downloading the raw content to your local workspace with a tool like wget / curl.

curl -o utf8_bug.html
wget -o utf8_bug.html

And then open it with a browser to see what shows up on the screen.

If the display is correct, then it's not a PHP / Apache issue, but the one caused by the browser's misbehaviour.

Not a mbstring issue anyway.

 [2005-11-14 11:06 UTC] Bjorn dot Wiberg at its dot uu dot se

The tests I performed were done through telnet, so no web browsers were involved at all.

However, I cannot reproduce the bug at the moment, so some changes in recent versions of PHP (5.0.5), or Apache, or a combination of the two, may have "fixed" this.

I'm asking for permission to update this bug should I come across more problems related to this (even though mbstring may not at all be involved).

Thank you for your help!

Best regards,
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Apr 13 12:01:28 2024 UTC