php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #15423 HTTP::negotiateLanguage() severely bugged.
Submitted: 2002-02-07 05:36 UTC Modified: 2002-11-30 03:58 UTC
From: vigna at acm dot org Assigned: mj (profile)
Status: Closed Package: PEAR related
PHP Version: 4.1.1 OS: Linux Red Hat 7.2
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: vigna at acm dot org
New email:
PHP Version: OS:

 

 [2002-02-07 05:36 UTC] vigna at acm dot org
The code for HTTP::negotiateLanguage() is severely bugged.

At line 76 of HTTP.php, $HTTP_ACCEPT_LANGUAGE is accessed without having been declarated global. Thus negotiation always happens on the empty string (a warning is generated).

At line 102 $HTTP_SERVER_VARS['REMOTE_HOST'] is accessed without checking for existence of the key, causing a warning.

The example stated in the documentation above the function uses "_" to separate language and country. The HTTP RFC uses "-". The regexp used to parse the header has neither.

The default value ("en_US") suffers from the same problem--it does not respect the RFC.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-02-07 06:18 UTC] mj@php.net
The first two problems have been fixed in CVS. I also think,
that we can easily change the problem with the language
code, but I would like to hear the opinion of the package
maintainers.

- Martin
 [2002-02-07 06:22 UTC] vigna at acm dot org
Is it possible to get CVS access? It did that some time ago, but now I cannot find a pointer in the PEAR site.

The problem with "_" is that Linux uses that for locales. But an HTTP language negotiation should IMHO return the HTTP language, not some system-specific counterpart. The user has just to do a strtr if necessary. Better fix it now than breaking other code later...
 [2002-02-11 08:08 UTC] mj@php.net
I fixed the outstanding issue in CVS. The documentation will be updated around tomorrow.
 [2002-02-17 13:53 UTC] vigna at acm dot org
The bug in the regexp is still there:

'^([a-z]+);[[:space:]]*q=([0-9\.]+)'

Unless I'm *really* missing something, this will not accept, say, en-US from the browser.

Another problem is the last part of the function: guessing the language from the domain should at least be given as an option. I think, for instance, to german-speaking people in Italy, which are very proud of speaking German and would be irritated and frustrated by a site giving content in Italian just because of their .it extension.


 [2002-11-30 03:58 UTC] vigna at acm dot org
Nothing has changed: the bug is still there.
 [2002-12-24 05:13 UTC] chris_se at gmx dot net
This function still does not behave compliant to
RFC2616.
(http://sunsite.iisc.ernet.in/collection/rfc/rfc2616.html#103)

It uses the following regular expression:

^([a-z_-]+);[[:space:]]*q=([0-9\.]+)

First of all, the quality is optional, in the case it is not supplied, the quality of 1 should be assumed. Next, the underscore ist _not_ allowed. (if any browser sends it (which I don't believe), its the problem of that browser) Further, this function would accept the following languages:

en--us
en-thisisaverylonglanguagecode

which are _not_ allowed by the RFC. A better (but not perfect) regular expression would be:

^([a-z]{1,8}(?:-[a-z]{1,8})*)(?:;[[:space:]]*q=([0-9\.]+))

Another question: why do you use eregi instead of preg_match?

Also, the fallback to the TLD should indeed be optional, because in my eyes the TLD can tell nothing about the language the user speaks. (vigna@acm.org already supplied an example)
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Jan 03 03:01:29 2025 UTC