php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63316 Files declared as UTF-8 will append NUL bytes to output for non ASCII character
Submitted: 2012-10-19 21:56 UTC Modified: -
Votes:13
Avg. Score:4.5 ± 0.8
Reproduced:7 of 10 (70.0%)
Same Version:5 (71.4%)
Same OS:3 (42.9%)
From: post at wickenrode dot com Assigned:
Status: Open Package: Unicode Engine related
PHP Version: 5.4.8 OS: Xubuntu 12.10 amd64
Private report: No CVE-ID:
Have you experienced this issue?
Rate the importance of this bug to you:

 [2012-10-19 21:56 UTC] post at wickenrode dot com
Description:
------------
Take the test script below and save it as UTF-8 and verify:

> cat test.php | tail -n 4 | head -n 1 | hexdump -C
00000000  09 23 20 c3 a4 c3 b6 0a                           |.# .....|
00000008

Now run

> ./bin/php test.php | hexdump -C
PHP Version = 5.4.8
zend.multibyte = 1
00000000  58 00 00 00                                       |X...|
00000004

As you can see PHP added three NUL bytes to the file.
Now remove one of the characters in the commented out line:

> ./bin/php test.php | hexdump -C
PHP Version = 5.4.8
zend.multibyte = 1
00000000  58 00 00                                          |X..|
00000003

As you can see the NUL bytes are directly related to the amount of non-ASCII characters in the file.

Now comment out the declare statement:

> ./bin/php test.php | hexdump -C
PHP Version = 5.4.8
zend.multibyte = 1
00000000  58                                                |X|
00000001

This time the output is correct.

Tested with off-the-shelf PHP, but also happens with ubuntu packaged 5.4.6-1
Also happends when PHP is running as Apache module where this issue caused corrupted images for me.

Test script:
---------------
<?php
	declare(encoding='UTF-8');

	fwrite(STDERR,"PHP Version = ".phpversion()."\n");
	fwrite(STDERR,"zend.multibyte = ".ini_get('zend.multibyte')."\n");

	# äöü
	echo "X";

?>


Expected result:
----------------
See above.
UTF-8 declared files with UTF-8 content should work fine.

Actual result:
--------------
See above.
UTF-8 declared files with UTF-8 content produce NUL bytes in the output.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-02-17 07:16 UTC] wynn dot chen dot cn at gmail dot com
i think it's not a bug, just something like #62351.
set 
  mbstring.internal_encoding = utf-8
in php.ini, and then everything works fine.
 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Wed Apr 23 09:02:23 2014 UTC