php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #15423 HTTP::negotiateLanguage() severely bugged.
Submitted: 2002-02-07 05:36 UTC Modified: 2002-11-30 03:58 UTC
From: vigna at acm dot org Assigned: mj (profile)
Status: Closed Package: PEAR related
PHP Version: 4.1.1 OS: Linux Red Hat 7.2
Private report: No CVE-ID: None
 [2002-02-07 05:36 UTC] vigna at acm dot org
The code for HTTP::negotiateLanguage() is severely bugged.

At line 76 of HTTP.php, $HTTP_ACCEPT_LANGUAGE is accessed without having been declarated global. Thus negotiation always happens on the empty string (a warning is generated).

At line 102 $HTTP_SERVER_VARS['REMOTE_HOST'] is accessed without checking for existence of the key, causing a warning.

The example stated in the documentation above the function uses "_" to separate language and country. The HTTP RFC uses "-". The regexp used to parse the header has neither.

The default value ("en_US") suffers from the same problem--it does not respect the RFC.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-02-07 06:18 UTC] mj@php.net
The first two problems have been fixed in CVS. I also think,
that we can easily change the problem with the language
code, but I would like to hear the opinion of the package
maintainers.

- Martin
 [2002-02-07 06:22 UTC] vigna at acm dot org
Is it possible to get CVS access? It did that some time ago, but now I cannot find a pointer in the PEAR site.

The problem with "_" is that Linux uses that for locales. But an HTTP language negotiation should IMHO return the HTTP language, not some system-specific counterpart. The user has just to do a strtr if necessary. Better fix it now than breaking other code later...
 [2002-02-11 08:08 UTC] mj@php.net
I fixed the outstanding issue in CVS. The documentation will be updated around tomorrow.
 [2002-02-17 13:53 UTC] vigna at acm dot org
The bug in the regexp is still there:

'^([a-z]+);[[:space:]]*q=([0-9\.]+)'

Unless I'm *really* missing something, this will not accept, say, en-US from the browser.

Another problem is the last part of the function: guessing the language from the domain should at least be given as an option. I think, for instance, to german-speaking people in Italy, which are very proud of speaking German and would be irritated and frustrated by a site giving content in Italian just because of their .it extension.


 [2002-11-30 03:58 UTC] vigna at acm dot org
Nothing has changed: the bug is still there.
 [2002-12-24 05:13 UTC] chris_se at gmx dot net
This function still does not behave compliant to
RFC2616.
(http://sunsite.iisc.ernet.in/collection/rfc/rfc2616.html#103)

It uses the following regular expression:

^([a-z_-]+);[[:space:]]*q=([0-9\.]+)

First of all, the quality is optional, in the case it is not supplied, the quality of 1 should be assumed. Next, the underscore ist _not_ allowed. (if any browser sends it (which I don't believe), its the problem of that browser) Further, this function would accept the following languages:

en--us
en-thisisaverylonglanguagecode

which are _not_ allowed by the RFC. A better (but not perfect) regular expression would be:

^([a-z]{1,8}(?:-[a-z]{1,8})*)(?:;[[:space:]]*q=([0-9\.]+))

Another question: why do you use eregi instead of preg_match?

Also, the fallback to the TLD should indeed be optional, because in my eyes the TLD can tell nothing about the language the user speaks. (vigna@acm.org already supplied an example)
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Tue Jun 22 05:01:23 2021 UTC