php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #33198 Leading UTF-8 byte order mark yields different character encodings in output
Submitted: 2005-05-31 09:42 UTC Modified: 2005-11-14 11:06 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: Bjorn dot Wiberg at its dot uu dot se Assigned: moriyoshi (profile)
Status: Closed Package: Unknown/Other Function
PHP Version: 5CVS-2005-06-21 OS: *
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
38 - 36 = ?
Subscribe to this entry?

 
 [2005-05-31 09:42 UTC] Bjorn dot Wiberg at its dot uu dot se
Description:
------------
When a PHP/PHTML file (a file getting parsed by PHP) contains a leading UTF-8 Byte Order Mark (BOM), EF BB BF in hex, the character encoding of the output varies.


Possibly relevant PHP configuration snippets from httpd.conf:

php_value default_charset none
php_value default_mimetype "text/html"
php_admin_value output_buffering 4096
php_admin_value output_handler none


PHP configuration flags (from config.nice):

CPPFLAGS='-I/usr/local/include' \
LDFLAGS='-L/lib -L/opt/freeware/lib -L/usr/local/lib' \
CC='/usr/local/bin/gcc' \
'./configure' \
'--enable-bcmath' \
'--enable-calendar' \
'--enable-dba' \
'--enable-dbase' \
'--enable-dbx' \
'--enable-debug' \
'--enable-dio' \
'--enable-exif' \
'--enable-embedded-mysqli' \
'--enable-filepro' \
'--enable-ftp' \
'--enable-gd-jis-conv' \
'--enable-gd-native-ttf' \
'--enable-mbstring' \
'--enable-memory-limit' \
'--enable-shmop' \
'--enable-soap' \
'--enable-sockets' \
'--enable-sysvmsg' \
'--enable-sysvsem' \
'--enable-sysvshm' \
'--enable-yp' \
'--enable-zend-multibyte' \
'--prefix=/apache/php' \
'--with-apxs2=/apache/bin/apxs' \
'--with-bz2' \
'--with-freetype-dir' \
'--with-gd' \
'--with-gdbm' \
'--with-gettext' \
'--with-inifile' \
'--with-jpeg-dir' \
'--with-ldap' \
'--with-libxml-dir' \
'--with-mime-magic' \
'--with-mysql=/usr/local/mysql' \
'--with-openssl=/opt/freeware' \
'--with-png-dir' \
'--with-tiff-dir' \
'--with-ttf' \
'--with-xpm-dir' \
'--with-zlib' \
'--with-zlib-dir' \
"$@"

Reproduce code:
---------------
Source code with leading BOM: http://www.anst.uu.se/bwiberg/php/utf8_bug.php.txt

Corresponding page yielding varying result:
http://www.anst.uu.se/bwiberg/php/utf8_bug.php

(Use telnet or other "dumb" tool to dump the results; it appears that the result varies by server process that handles the request. We're using the Apache 2 prefork MPM.)

Example of incorrect result, here we get ISO-8859-1 encoded output:

http://www.anst.uu.se/bwiberg/php/utf8_bug_result1.txt


Expected result:
----------------
Example of correct result, here we get UTF-8 encoded output:

http://www.anst.uu.se/bwiberg/php/utf8_bug_result2.txt


Actual result:
--------------
Varies between the two examples. Make sure to close the connection in between (or the same server process will serve your multiple requests).

Restarting the web server does not help.

Changing the PHP settings for default character set (default_charset) and default MIME type (default_mimetype) does not help; I have tried all combinations.

Turning off output buffering (php_admin_flag output_buffering off) does not help.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-06-15 21:10 UTC] tony2001@php.net
And if you change *.php to *.<something_else> it still b0rks?
Why do you think it's PHPs fault? 
From what I can see, there is no PHP functions involved at all.
 [2005-06-16 14:44 UTC] Bjorn dot Wiberg at its dot uu dot se
Hi!

Please try the following:

http://www.anst.uu.se/bwiberg/php/utf8_bug.phtml
...exhibits the bug randomly just like *.php. When in error, the output looks like this:
http://www.anst.uu.se/bwiberg/php/utf8_bug_result3.txt

http://www.anst.uu.se/bwiberg/php/utf8_bug.html
...does not exhibit the bug; output does not get re-encoded in any way, i.e. UTF-8 characters remain UTF-8:
http://www.anst.uu.se/bwiberg/php/utf8_bug_result4.txt

It appears that something happens to the output (sometimes) when PHP parses the file.

Best regards,
Bj?rn
 [2005-06-19 00:50 UTC] sniper@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5-win32-latest.zip


 [2005-06-21 17:28 UTC] Bjorn dot Wiberg at its dot uu dot se
Hi again!

I just tried the 2005-06-21 10:30 snapshot, but the problem persists.

Best regards,
Bj?rn
 [2005-06-27 00:59 UTC] tony2001@php.net
Check out this Apache bug: http://issues.apache.org/bugzilla/show_bug.cgi?id=16687
 [2005-06-27 09:33 UTC] Bjorn dot Wiberg at its dot uu dot se
Hi again!

Sorry, but this is not connected to mod_cgi in any way. We're not using PHP through mod_cgi, but as a handler. Apache does not have a problem serving the page -- instead it seems that the error only occurs when the file is being processed by the PHP handler.

Best regards,
Bj?rn
 [2005-11-13 20:39 UTC] moriyoshi@php.net
Try downloading the raw content to your local workspace with a tool like wget / curl.

curl -o utf8_bug.html  http://www.anst.uu.se/bwiberg/php/utf8_bug.php
wget -o utf8_bug.html http://www.anst.uu.se/bwiberg/php/utf8_bug.php

And then open it with a browser to see what shows up on the screen.

If the display is correct, then it's not a PHP / Apache issue, but the one caused by the browser's misbehaviour.

Not a mbstring issue anyway.

 [2005-11-14 11:06 UTC] Bjorn dot Wiberg at its dot uu dot se
Hi!

The tests I performed were done through telnet, so no web browsers were involved at all.

However, I cannot reproduce the bug at the moment, so some changes in recent versions of PHP (5.0.5), or Apache, or a combination of the two, may have "fixed" this.

I'm asking for permission to update this bug should I come across more problems related to this (even though mbstring may not at all be involved).

Thank you for your help!

Best regards,
Bj?rn
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 08:01:28 2024 UTC