php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #75922 default_charset cannot be disabled in Content-Type
Submitted: 2018-02-06 09:16 UTC Modified: 2018-02-06 10:24 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: bugs at jth dot net Assigned:
Status: Open Package: PHP options/info functions
PHP Version: 7.1.14 OS: Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: bugs at jth dot net
New email:
PHP Version: OS:

 

 [2018-02-06 09:16 UTC] bugs at jth dot net
Description:
------------
default_charset in php7 is now a mess being used in two different contexts. It should be two different options: one for internal encoding and one for enabling/disabling being output in the Content-Type: header.

The Content-Type: header is overriding the meta tag in the document, which different users are creating in different charsets using the meta tag. 
php is often used for simple functions not involving or regardless of the charset
e.g. database functions.

It is a nuisance and prone to error having the users to specify the charset both as an unnecessary header() call and as a meta tag. 



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-02-06 09:31 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2018-02-06 09:31 UTC] requinix@php.net
Having default_charset not match the actual charset used is a recipe for disaster. Why aren't they changing that?
 [2018-02-06 09:59 UTC] bugs at jth dot net
-Status: Feedback +Status: Open
 [2018-02-06 09:59 UTC] bugs at jth dot net
You are in no position to require that users should use a single character set in PHP. There are a number of reasons why different charsets and languages should coexist on the same web server. It was working perfectly fine in php5, but is now broken after upgrading to php7.

Apache2 httpd.conf

#
# Specify a default charset for all content served; this enables
# interpretation of all content as UTF-8 by default.  To use the
# default browser choice (ISO-8859-1), or to allow the META tags
# in HTML content to override this choice, comment out this
# directive:
#
#AddDefaultCharset UTF-8

This is working by commenting out or specifying
AddDefaultCharset  Off

  
There should be a similar option in PHP

I have experimented with

default_charset = ""
internal_encoding = "UTF-8"

which apparently is working, but I am not sure, especially not when the PHP documentation states

"Setting default_charset to an empty value is not recommended."
 [2018-02-06 10:09 UTC] spam2 at rhsoft dot net
> Having default_charset not match the actual charset used is a recipe for disaster

that unasked header too!
https://github.com/apache/trafficserver/issues/2849

yes, it is a bug in ATS, but without the unasked charset in the content-ytpe header this issue would not exist at all
 [2018-02-06 10:24 UTC] requinix@php.net
> You are in no position to require that users should use a single character set in PHP.
Please don't put words in my mouth. All I asked was why they had PHP configured to use one default_charset (if not UTF-8 by default) but were serving HTML documents in another charset. With a database, PHP, and the browser, all it takes is one incorrect character encoding setting or practice to create headaches that can last for years.

Besides the "nuisance" call to header() you already know about, setting default_charset empty is currently the only way I see to prevent adding the charset= to the Content-Type. I don't know the exact reasons for why it's not recommended to have it empty, but I am sure that setting default_charset to the charset in use is a good idea. And it certainly does not have to be set to the same value for all requests to a server, let alone the same for all users.

I still don't see a good reason for why this behavior should be (optionally) disabled but I'll leave this open.

> yes, it is a bug in ATS
Precisely. Someone over there forgot that Content-Types can specify media type parameters
  https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7
Omitting the charset= in the header would be a workaround; fixing the tool to be aware of the structure of a Content-Type header so that it could match the media type properly would be the solution.
 [2018-02-06 10:54 UTC] spam2 at rhsoft dot net
@requinix@php.net: "All I asked was why they had PHP configured to use one default_charset (if not UTF-8 by default) but were serving HTML documents in another charset" is pretty simple to answer: on many shared hostings you just don't have control about each and every script if it sends a charset header or not but with the default PHP behavior you need to do so

but when PHP stops to send it implicit the html-metatag is enough, simple as this: that's it 

when the script additionally sends a header() fine - but currently you need to take care on script side for no good reason because probably a *template* has a charset header which may not be the same as the default PHP setting
 [2023-01-29 03:29 UTC] jake at qzdesign dot co dot uk
This explains one of the issues more clearly than the OP here:

stackoverflow dot com / questions / 46484185

If the specific issue raised there should be a separate bug report, please advise or create one.

However, it is noted in comment 2018-02-06 10:24 UTC by requinix that

> setting default_charset empty is currently the only way I see to prevent adding the charset= to the Content-Type

So this is perhaps what this bug is about?

If so, it's also possible to somewhat circumvent the bug by using `\header('Content-Type: text/html; charset=');`, but the `charset=` (with nothing following) still appears in the HTTP headers (though doesn't seem to override that in the document, or cause any other problem, with browsers I've tested against).

That's clearly not an ideal solution.  I might give `\ini_set('charset', null)` a go (restoring it after calling `header`) to see if that indeed does work.  Thanks for the tip.

Bottom line: this is definitely a bug (or perhaps a poorly-implemented feature - and maybe one that no-one ever asked for, that could simply be removed).
 [2023-01-29 03:49 UTC] jake at qzdesign dot co dot uk
To clarify, there is definitely a bug here:

It is not possible to send the HTTP header `Content-Type: text/html`.

That is, without it being adulterated by PHP in an undesirable manner.

And without some ridiculous and unfathomable workaround being employed (to circumvent what PHP is doing, entirely unasked-for).

Please fix it.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 12:01:29 2024 UTC