php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #75922 default_charset cannot be disabled in Content-Type
Submitted: 2018-02-06 09:16 UTC Modified: 2018-02-06 10:24 UTC
From: bugs at jth dot net Assigned:
Status: Open Package: PHP options/info functions
PHP Version: 7.1.14 OS: Linux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2018-02-06 09:16 UTC] bugs at jth dot net
Description:
------------
default_charset in php7 is now a mess being used in two different contexts. It should be two different options: one for internal encoding and one for enabling/disabling being output in the Content-Type: header.

The Content-Type: header is overriding the meta tag in the document, which different users are creating in different charsets using the meta tag. 
php is often used for simple functions not involving or regardless of the charset
e.g. database functions.

It is a nuisance and prone to error having the users to specify the charset both as an unnecessary header() call and as a meta tag. 



Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-02-06 09:31 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2018-02-06 09:31 UTC] requinix@php.net
Having default_charset not match the actual charset used is a recipe for disaster. Why aren't they changing that?
 [2018-02-06 09:59 UTC] bugs at jth dot net
-Status: Feedback +Status: Open
 [2018-02-06 09:59 UTC] bugs at jth dot net
You are in no position to require that users should use a single character set in PHP. There are a number of reasons why different charsets and languages should coexist on the same web server. It was working perfectly fine in php5, but is now broken after upgrading to php7.

Apache2 httpd.conf

#
# Specify a default charset for all content served; this enables
# interpretation of all content as UTF-8 by default.  To use the
# default browser choice (ISO-8859-1), or to allow the META tags
# in HTML content to override this choice, comment out this
# directive:
#
#AddDefaultCharset UTF-8

This is working by commenting out or specifying
AddDefaultCharset  Off

  
There should be a similar option in PHP

I have experimented with

default_charset = ""
internal_encoding = "UTF-8"

which apparently is working, but I am not sure, especially not when the PHP documentation states

"Setting default_charset to an empty value is not recommended."
 [2018-02-06 10:09 UTC] spam2 at rhsoft dot net
> Having default_charset not match the actual charset used is a recipe for disaster

that unasked header too!
https://github.com/apache/trafficserver/issues/2849

yes, it is a bug in ATS, but without the unasked charset in the content-ytpe header this issue would not exist at all
 [2018-02-06 10:24 UTC] requinix@php.net
> You are in no position to require that users should use a single character set in PHP.
Please don't put words in my mouth. All I asked was why they had PHP configured to use one default_charset (if not UTF-8 by default) but were serving HTML documents in another charset. With a database, PHP, and the browser, all it takes is one incorrect character encoding setting or practice to create headaches that can last for years.

Besides the "nuisance" call to header() you already know about, setting default_charset empty is currently the only way I see to prevent adding the charset= to the Content-Type. I don't know the exact reasons for why it's not recommended to have it empty, but I am sure that setting default_charset to the charset in use is a good idea. And it certainly does not have to be set to the same value for all requests to a server, let alone the same for all users.

I still don't see a good reason for why this behavior should be (optionally) disabled but I'll leave this open.

> yes, it is a bug in ATS
Precisely. Someone over there forgot that Content-Types can specify media type parameters
  https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7
Omitting the charset= in the header would be a workaround; fixing the tool to be aware of the structure of a Content-Type header so that it could match the media type properly would be the solution.
 [2018-02-06 10:54 UTC] spam2 at rhsoft dot net
@requinix@php.net: "All I asked was why they had PHP configured to use one default_charset (if not UTF-8 by default) but were serving HTML documents in another charset" is pretty simple to answer: on many shared hostings you just don't have control about each and every script if it sends a charset header or not but with the default PHP behavior you need to do so

but when PHP stops to send it implicit the html-metatag is enough, simple as this: that's it 

when the script additionally sends a header() fine - but currently you need to take care on script side for no good reason because probably a *template* has a charset header which may not be the same as the default PHP setting
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Wed Jan 23 07:01:25 2019 UTC