PHP :: Bug #15423 :: HTTP::negotiateLanguage() severely bugged.

Bug #15423	HTTP::negotiateLanguage() severely bugged.
Submitted:	2002-02-07 05:36 UTC	Modified:	2002-11-30 03:58 UTC
From:	vigna at acm dot org	Assigned:	mj (profile)
Status:	Closed	Package:	PEAR related
PHP Version:	4.1.1	OS:	Linux Red Hat 7.2
Private report:	No	CVE-ID:	None

View Developer Edit

[2002-02-07 05:36 UTC] vigna at acm dot org

The code for HTTP::negotiateLanguage() is severely bugged.

At line 76 of HTTP.php, $HTTP_ACCEPT_LANGUAGE is accessed without having been declarated global. Thus negotiation always happens on the empty string (a warning is generated).

At line 102 $HTTP_SERVER_VARS['REMOTE_HOST'] is accessed without checking for existence of the key, causing a warning.

The example stated in the documentation above the function uses "_" to separate language and country. The HTTP RFC uses "-". The regexp used to parse the header has neither.

The default value ("en_US") suffers from the same problem--it does not respect the RFC.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2002-02-07 06:18 UTC] mj@php.net

The first two problems have been fixed in CVS. I also think,
that we can easily change the problem with the language
code, but I would like to hear the opinion of the package
maintainers.

- Martin

[2002-02-07 06:22 UTC] vigna at acm dot org

Is it possible to get CVS access? It did that some time ago, but now I cannot find a pointer in the PEAR site.

The problem with "_" is that Linux uses that for locales. But an HTTP language negotiation should IMHO return the HTTP language, not some system-specific counterpart. The user has just to do a strtr if necessary. Better fix it now than breaking other code later...

[2002-02-11 08:08 UTC] mj@php.net

I fixed the outstanding issue in CVS. The documentation will be updated around tomorrow.

[2002-02-17 13:53 UTC] vigna at acm dot org

The bug in the regexp is still there:

'^([a-z]+);[[:space:]]*q=([0-9\.]+)'

Unless I'm *really* missing something, this will not accept, say, en-US from the browser.

Another problem is the last part of the function: guessing the language from the domain should at least be given as an option. I think, for instance, to german-speaking people in Italy, which are very proud of speaking German and would be irritated and frustrated by a site giving content in Italian just because of their .it extension.

[2002-11-30 03:58 UTC] vigna at acm dot org

Nothing has changed: the bug is still there.

[2002-12-24 05:13 UTC] chris_se at gmx dot net

This function still does not behave compliant to
RFC2616.
(http://sunsite.iisc.ernet.in/collection/rfc/rfc2616.html#103)

It uses the following regular expression:

^([a-z_-]+);[[:space:]]*q=([0-9\.]+)

First of all, the quality is optional, in the case it is not supplied, the quality of 1 should be assumed. Next, the underscore ist _not_ allowed. (if any browser sends it (which I don't believe), its the problem of that browser) Further, this function would accept the following languages:

en--us
en-thisisaverylonglanguagecode

which are _not_ allowed by the RFC. A better (but not perfect) regular expression would be:

^([a-z]{1,8}(?:-[a-z]{1,8})*)(?:;[[:space:]]*q=([0-9\.]+))

Another question: why do you use eregi instead of preg_match?

Also, the fallback to the TLD should indeed be optional, because in my eyes the TLD can tell nothing about the language the user speaks. (vigna@acm.org already supplied an example)

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2026 The PHP Group All rights reserved.	Last updated: Thu Apr 09 03:00:02 2026 UTC