php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #78043 UTF-8 BOM is carried from an included file into a Header Content
Submitted: 2019-05-20 17:23 UTC Modified: 2019-05-20 17:27 UTC
From: barryd dot it at gmail dot com Assigned:
Status: Duplicate Package: *Unicode Issues
PHP Version: 7.2.18 OS: Windows
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
44 - 14 = ?
Subscribe to this entry?

 
 [2019-05-20 17:23 UTC] barryd dot it at gmail dot com
Description:
------------
I included a file into my script, and soon found that since that file was a UTF-8 encoded file which means that at the beginning of the file, is a UTF-8 BOM code sitting just before the ...<?php ?> tags, and that the information which is sent with Content-Disposition: attachment;, and no matter what my Content-Type: application/rtf; charset=iso-8859-1//TRANSLIT says, the downloaded content turned out to be UTF-8 instead of ANSI. The string itself was entirely ASCII. What needs to be changed is that BOM code needs to be filtered out of the content, thus ensuring that the content-type can be honored. In order to recreate this scenario, simply change the include.php files' encoding to UTF-8, and use the code mentioned below. And once you save the file sent from the web server to you system, you to will find that the content is UTF-8.

Test script:
---------------
<?php
include_once('include.php');
$rtf = 'Testing';
header('Content-Type: application/rtf; charset=iso-8859-1//TRANSLIT');
header('Content-Disposition: attachment; filename="' . '10-Jane-Doe' . '.rtf"');
header('Content-Length: ' . strlen($rtf));
header('Expires: Fri, 01 Jan 2010 05:00:00 GMT');
header('Last-Modified: ' . gmdate( 'D, d M Y H:i:s' ) . ' GMT'); 
header('Cache-Control: private, must-revalidate');
header('Pragma: no-cache');
ob_start();
echo $rtf;
ob_end_flush();
exit();
?>

Expected result:
----------------
I would expect the web server to honor the Content-Type of the string and the web browser honor the Encoding of file being downloaded, and not have php push content  not part of the string into the output buffer.

Testing

Actual result:
--------------
Testing is being downloaded into the rtf file, thus causing the file to be UTF-8, and then causing the file not to rendered properly with LibreOffice.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-05-20 17:26 UTC] spam2 at rhsoft dot net
the BOM is *before* <?php and so sent to the client
after that you can't no longer send headers
don't create files with a BOM - it's that easy
 [2019-05-20 17:27 UTC] requinix@php.net
-Status: Open +Status: Duplicate
 [2019-05-20 17:27 UTC] requinix@php.net
Note that the UTF-8 spec does not recommend using a BOM in the first place.

Duplicate because the issue of PHP recognizing file encodings, especially BOMs, has been raised many times before. And simply discarding it is not proper.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 20:01:28 2024 UTC