php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #41609 file_put_contents() injects \r
Submitted: 2007-06-06 08:08 UTC Modified: 2007-06-14 21:12 UTC
From: zoe at uk dot ibm dot com Assigned:
Status: Closed Package: Documentation problem
PHP Version: 6CVS-2007-06-06 (snap) OS: Windows XP
Private report: No CVE-ID: None
 [2007-06-06 08:08 UTC] zoe at uk dot ibm dot com
Description:
------------
A file created on Windows using file_put_contents in PHP6 has different contents if unicode.semantics=1 is used.

The files (data.tmp) created by the test case below have different contents, from "od -x" it looks to me as though an additional \r is inserted in the file before the \n.






Reproduce code:
---------------
<?php
$file_path = dirname(__FILE__);
$buffer = "text\nline of text\n";
file_put_contents( $file_path."/data.tmp", (binary)$buffer);

$fp = fopen($file_path."/data.tmp", "r");
var_dump( file_get_contents($file_path."/data.tmp") );
fclose($fp);

$fp = fopen($file_path."/data.tmp", "r");
var_dump( fgets($fp) );
fclose($fp);

$fp = fopen($file_path."/data.tmp", "rb");
var_dump( fgets($fp) );
fclose($fp);

$fp = fopen($file_path."/data.tmp", "rt");
var_dump( fgets($fp) );
fclose($fp);

?>

Expected result:
----------------
string(18) "text
line of text
"
string(5) "text
"
string(5) "text
"
string(5) "text
"

Actual result:
--------------
string(20) "text
line of text
"
string(6) "text
"
string(6) "text
"
unicode(5) "text
"

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-06-06 08:38 UTC] tony2001@php.net
Could you plz try the same on Linux?
I can't replicate it there.
 [2007-06-06 08:56 UTC] zoe at uk dot ibm dot com
Hi - sorry - I should have mentioned that the behaviour is not reproducible on Linux. It's Windows specific. Would you like the copies of the data.tmp files that are created in each case? - if so I'll attach them to this.
 [2007-06-06 09:35 UTC] tony2001@php.net
No, thanks. This should be enough for somebody who knows how to debug it on Windows. 
Unfortunately, I don't.
 [2007-06-06 10:26 UTC] johannes@php.net
I'd assume this is some ICU feature when converting a text with newlines from UTF-8 to UTF-16 and then back to UTF-8.

Does something like
<?php
$a = "foo\nbar";
var_dump((binary)$a);
?>
also show the \r?
 [2007-06-06 10:48 UTC] zoe at uk dot ibm dot com
Hi Johannes

No, it doesn't:

E:\zoe\TESTS\slashr>e:\zoe\buildsystem\php6exe\php.exe -d unicode.semantics=0 johannes.php
string(7) "foo
bar"

E:\zoe\TESTS\slashr>e:\zoe\buildsystem\php6exe\php.exe -d unicode.semantics=1 johannes.php
string(7) "foo
bar"
 [2007-06-09 21:03 UTC] nlopess@php.net
when files are not opened with the binary flag, windows automatically converts \n to \r\n. I don't have time until mid-July to investigate this problem though.
 [2007-06-09 21:17 UTC] pajoye@php.net
"when files are not opened with the binary flag, windows automatically
converts \n to \r\n. I don't have time until mid-July to investigate
this problem though."

As far as I remember, it is the documented behavior (windows file functions).

It is especially important to take care of text contents in the unicode mode, it is not a good idea to blindly save everything as binary. That's why php6 has two new flags which can be used in file_put_contents: 
- FILE_TEXT, opens with "wt" (or "at" if FILE_APPEND is used)
- FILE_BINARY, opens with "wb" (or "ab"

the default mode is "w", which is a text mode on windows.

I do not have a windows at hand to test but I'm pretty sure that FILE_BINARY solves your issue. It may use it automatically when a binary string is given, but I would find that trickier (magic++).
 [2007-06-11 09:56 UTC] zoe at uk dot ibm dot com
Yes - you are right. The FILE_BINARY flag does fix the problem. I'm closing this.
 [2007-06-11 10:58 UTC] pajoye@php.net
Not a bug > bogus (I don't like this word but well :)
 [2007-06-11 11:30 UTC] zoe at uk dot ibm dot com
Fair enough - I was wondering if it should also be either RTFM or WTFM?
Just had a quick look at what I hope are the PHP6 docs and I can't see the new put_file_contents() flags documented yet. *IF* there is some work needed on the docs is the right process to open another defect?
 [2007-06-11 12:15 UTC] pajoye@php.net
"*IF* there is some work needed on the docs is the right process to open another defect?"

Good point, there is a lot of work to be done regarding php6 behaviors of each function. I'm not sure what the phpdoc team decided but I will let them comment here, if required.

changed to open + documentation problem.


 [2007-06-11 17:47 UTC] nlopess@php.net
"I'm pretty sure that FILE_BINARY solves your issue. It may use it automatically when a binary string is given, but I would find that trickier (magic++)"

I don't see that as magical.. If a binary string is given, you want it writen as-is, IMHO. I'm classifying back to a PHP bug, because I feel this needs more discussion.
 [2007-06-11 18:25 UTC] pajoye@php.net
"I don't see that as magical.. If a binary string is given, you want it
writen as-is, IMHO. I'm classifying back to a PHP bug, because I feel
this needs more discussion."

Yes, but it is still magic as it introduces possible confusions and wtf factors. Having to specify the mode just like you do with fopen is clear and not confusing. It is only in Unicode mode so no BC involved either.
 [2007-06-11 20:16 UTC] zoe at uk dot ibm dot com
"Having to specify the mode just like you do with fopen is clear
and not confusing."

Well - this made me think of trying something else. In the following test case:

<?php
$file_path = dirname(__FILE__);
$buffer = "text\nline of text\n";
$nbytes_fpc = file_put_contents( $file_path."/data_fpc.tmp", (binary)$buffer);


$fh = fopen( $file_path."/data_owc.tmp", "w");
$nbytes_owc = fwrite ($fh, (binary)$buffer);
fclose ($fh);

echo "Bytes fpc = $nbytes_fpc Bytes owc = $nbytes_owc \n";
?>

file_put_contents creates a file with \r, whereas using fopen() doesn't. I tried using "wb" in fopen and got the same result.

The interesting thing is that this behaviour is only seen with unicode.semantics=1, when unicode.semantics=0 the two files in the tescase above are identical.

I'm not sure whether this makes it a doc or a code defect :-) Just more information....
 [2007-06-11 20:31 UTC] pajoye@php.net
"file_put_contents creates a file with \r, whereas using fopen() doesn't. I tried using "wb" in fopen and got the same result."

file_put_contents should be used with the new flag. I'm not sure what we should do for fopen, I think we should make it consistent.

About forcing the binary mode if a binary string is given, this patch should do the trick for file_put_contents:

http://blog.thepimp.net/misc/patches/php/bug41609_force_binary.txt


 [2007-06-13 15:03 UTC] zoe at uk dot ibm dot com
Hi
Sorry for the delay in replying - it took some time to get PHP to build on Windows :-).

I can confirm that this patch makes the behaviour of file_put_contents() consistent with that of fwrite()with unicode.semantics=1 in PHP6. It doesn't seem to have regressed anything else so please would you commit it?

The extra flags would still need to be added to the docs.
 [2007-06-14 19:57 UTC] pajoye@php.net
"please would you commit it?"

Done

"The extra flags would still need to be added to the docs."

Move back to documentation.
 [2007-06-14 21:12 UTC] gwynne@php.net
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 08:01:30 2024 UTC