php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #27291 get_browser matches browscap.ini patterns incorrectly
Submitted: 2004-02-17 14:01 UTC Modified: 2004-03-15 16:20 UTC
From: php-bug-NOSPAM-2004 at ryandesign dot com Assigned: jay (profile)
Status: Closed Package: *General Issues
PHP Version: 4CVS, 5CVS (2004-02-20) OS: *
Private report: No CVE-ID: None
 [2004-02-17 14:01 UTC] php-bug-NOSPAM-2004 at ryandesign dot com
Description:
------------
PHP's get_browser() function does not correctly use the 
patterns in the browscap.ini file, resulting in 
occasional incorrect matches. This occurred, for 
example, when Apple released Safari 1.2, and when 
OmniGroup released OmniWeb 5.0b1. These two browsers 
were then incorrectly identified as crawlers / robots, 
instead of being recognized as normal browsers.

Instead of matching the last rule in the file (which has 
the browscap pattern "*" which PHP translates into the 
regular expression ".*"), it matches the rule for 
Website Strippers (which has the browscap pattern 
"Mozilla/5.0" which PHP translates to the regular 
expression "Mozilla/5\.0"). Yes, Safari and OmniWeb have 
"Mozilla/5.0" as part of their user agent string, but 
only part. "Mozilla/5.0" is not the ENTIRE UA string, 
which is what the browscap pattern is intending to 
define. Had the rule been intended to match "Mozilla/
5.0" at the start of the string, regardless of what 
followed, the rule would have been written "Mozilla/
5.0*". But it wasn't. PHP needs to anchor the regular 
expression it generates to the beginning and end of the 
string to ensure it is matching the portion of the 
string the browscap.ini author intended it to match. The 
regular expressions PHP should have generated are 
"^Mozilla/5\.0$" and "^.*$".

Here is a diff of the PHP source code file
ext/standard/browscap.c (from the version in the 4.3.4 
release) which seems to correct the problem. The 
commenting out of lines 71 to 73 in the original file 
(73 to 75 in my version) is not essential and is not 
part of the fix for this issue, but was done because 
those lines seem to me to be another inaccuracy in PHP's 
browscap.ini parsing, and their removal does not seem to 
adversely affect the functioning of get_browser(), 
although I did not extensively test against many user 
agent strings, and I do not know the reason that code 
was originally inserted.

50c50
<       t = (char *) malloc(Z_STRLEN_P(pattern)*2 + 1);
---
>       t = (char *) malloc(Z_STRLEN_P(pattern)*2 + 3);
52c52,54
<       for (i=0, j=0; i<Z_STRLEN_P(pattern); i++, j++) 
{
---
>       t[0] = '^';
> 
>       for (i=0, j=1; i<Z_STRLEN_P(pattern); i++, j++) 
{
71,73c73,75
<       if (j && (t[j-1] == '.')) {
<               t[j++] = '*';
<       }
---
> //    if (j && (t[j-1] == '.')) {
> //            t[j++] = '*';
> //    }
74a77,78
>       t[j++] = '$';
> 

Reproduce code:
---------------
Install the browscap.ini file available from www.garykeith.com and modify the php.ini to use this file. Then run this:

$ua = 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/1999 (KHTML, like Gecko) Safari/1999';

$ua_info = (array) get_browser($ua);

print $ua;

print '<pre>';
print_r($ua_info);
print '</pre>';

Expected result:
----------------
The browscap.ini does not know about Safari version 
1999. There is no such version; version 1.2 (125) is 
the most recent as of February 2004. And, at least in 
the version from a week or so ago, the browscap.ini does 
not define a generic "Safari" directive that would allow 
the browscap.ini to recognize it. So this user agent 
string should match the last rule in the file, "Default 
Browser", which has the pattern "*".

Actual result:
--------------
It actually matches the pattern "Mozilla/5.0", in the 
Website Strippers category.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-02-17 16:12 UTC] sniper@php.net
Using latest stable CVS snapshot does match with "Default Browser"..

 [2004-02-18 16:23 UTC] php_bug_27291 at garykeith dot com
Respectfully, my latest browscap.ini does not detect all arbitrary versions of Safari. I'm not sure how you arrived at that conclusion.

I do know that I receive e-mails nearly every day about this issue so there is obviously a problem somewhere.

I don't know who is working on the code for get_browser() these days but I wish they would contact me so we could come to some sort of understanding about how to properly parse my file the way browscap.dll does. I am growing very weary of my files and efforts taking the blame for the non-stop stream of bugs that emanate from get_browser().

Thanks,
~gary.
 [2004-02-19 07:12 UTC] php-bug-NOSPAM-2004 at ryandesign dot com
Sorry, Gary; my bad. When using the Feb 15, 2004 
browscap.ini, I had just looked at the browser match 
long enough to see that it found Safari, and did not 
look close enough to realize that the rule specifically 
matched the Safari v100 series. Due to one oddity in 
PHP's parsing code (a fix for which I provide through 
the commenting out of three lines, as seen in my diff), 
it ends up recognizing any Safari where the version 
number starts with 1, regardless of how many chars 
follow, which is why it recognized the fictitious 
version v1999 in my test case.

I have now found the place where the PHP CVS snapshots 
are kept (http://snaps.php.net), and have downloaded and 
compiled the stable 4.3.x snapshot from Feb 19, 2004 
10:30 GMT. Its behavior in relation to this bug remains 
unchanged when compared with the 4.3.4 and 4.3.5RC3 
releases; it's still broken.

I'd like to suggest that the PHP team reconsider 
evaluating and applying the diff I supplied in my 
original report.
 [2004-02-19 13:17 UTC] jay@php.net
I had done some work on the get_browser() function a while 
back, and I believe those were the last major changes to 
occur to that function. The function seemed to have been 
abandoned for quite some time before those changes.  
 
For the most part, based on the testing I did, the results 
seemed quite favourable, i.e. more information was now 
being returned by the function, such as operating systems 
and such that were previously missing from get_browser()'s 
output. Obviously there is still some room for 
improvement, though. 
 
I tried the original poster's patch using Gary's most 
up-to-date browscap.ini file and had some mixed results. I 
tested all of the unique user agent strings we had in our 
apache logs at work (1914 strings) and the results were 
sometimes better, sometimes worse, but overall they were 
pretty much the same. Here's a few things I noticed: 
 
- Netscape 7.x on Linux is better after the changes. (It 
was being reported as Mozilla 1.4 previously.) 
 
- Serveral versions of Mozilla on linux come up as Default 
Browser after the changes, as opposed to the correct 
information before the changes. 
 
- Something identified as a "Spoofed IE" is coming up 
correct before the changes, but comes up as Default 
Browser after the changes. 
 
- Epiphany 1.0 gets Default Browser after the changes, but 
comes up with "Mozilla 1.4" before the changes. 
 
- Some versions of Safari are being reported as Default 
Browser after the changes, while before the changes they 
seem to be coming up properly. (This includes the example 
in the original poster's example, which came up as Safari 
1.1 on my system.) 
 
- Some versions of Galeon are being reported better after 
the changes.  
 
- Some user agents that were reported as being Website 
Strippers before are now being reported as Default 
Browser. 
 
You can find the results of the tests, the UA strings I 
used and the script to generate them here: 
 
http://216.94.11.234/browsers.tar.gz 
 
That's with an up-to-date PHP_4_3 checkout and the latest 
browscap.ini. 
 
To Gary: I'll take any suggestions on how to improve 
get_browser(). Is there any similar implementation that 
provides better results that I can get ahold of? I see 
things for IIS, Java, etc. on your site, but is any of 
them better than the rest that I should look at? 
 
J 
 [2004-02-19 15:10 UTC] php_bug_27291 at garykeith dot com
Hi, Jay.

I use Microsoft's browscap.dll as my baseline for testing. I don't know of any way you can legally get a look at the source code for that!

I can't speak for the accuracy of the stuff I offer on my site for PHP or Java because I know nothing about either language. I work mostly in C++, C# and Visual Basic. The best I can tell you about that stuff is I've had almost no complaints about them beyond their inability to adapt to new properties in the browscap.ini file.

BTW, the Spoofed IE is from the parent Spoofed User Agents. These are user agents that are almost like the real thing and I keep a close watch on them to see if they eventually need to be moved to the Website Strippers parent.

Anyway, whatever it takes to get these bugs fixed, let's do it. Although I'm no longer willing to make major changes in my files to accommodate PHP perhaps my database of 22,000 user agents will let you do some more thorough testing. I'll also be willing to help in whatever other ways I can.

You can reach me via the e-mail address I used here if you need to contact me privately.
 [2004-02-23 12:01 UTC] php_bug_27291 at garykeith dot com
Jay, whom can I contact at PHP to let them know I no longer want to be the official source of browscap.ini for PHP? In the past two days I've had two bug reports, both of which only happen in PHP. When browscap.dll is doing the parsing it works fine.

I'm fed up with PHP. I waste more of my time dealing with bugs in PHP than any other aspect of my project and it's just not worth it to me anymore.
 [2004-02-23 12:58 UTC] php-bug-NOSPAM-2004 at ryandesign dot com
Sounds like what we need is a script that takes the user 
agents logged by a popular web site, runs them through 
browscap.dll and PHP's get_browser(), compares the 
results, and informs a PHP programmer about any 
differences, so that get_browser() can be kept in line.

I hate to volunteer for such a task, as I'm not a member 
of the PHP team, but perhaps, once my patch or one like 
it is applied, I could try it out for awhile to see what 
quantity of diffs is generated.

I'm not sure, however, how I would use browscap.dll. 
What kind of server would need to be used, and what 
programming language? I have a feeling you're going to 
say Microsoft IIS and Visual Basic or ASP or some such, 
none of which I've ever worked with. (I wouldn't know 
where to begin.) But if a system could be developed 
whereby one popular web server gathers a day's UA's, and 
then passes it off to both a script using browscap.dll 
and a second one using PHP's get_browser(), and then 
takes the results, compares them, and saves out only the 
differences, and then emails or FTPs a .tgz to me or 
someone, that might be a place to start. I'd be happy to 
help with such a script, tho like I said I wouldn't know 
how to tackle the browscap.dll part.

I looked through all the diffs between the PHP 4.3.4 
get_browser() matching and the results after applying my 
patch on Jay's user agent file, and I'm encouraged by 
the results. I think they're more accurate. I think it 
should improve the situation greatly. Gary, if you'd 
like to send your huge 20,000-entry user agent list in a 
.tgz I could do some more comparisons and see how the 
patch holds up.
 [2004-02-23 16:39 UTC] php_bug_27291 at garykeith dot com
Tell me what a .tgz file is and if I can do it I will.

I'm working two new bugs that I hope someone will post bug reports about.

The first deals with the exclamation point that's part of the new Yahoo! Slurp crawler. I'm not sure what PHP is doing since I don't speak PHP but I'm told it's throwing a parsing error.

I've also had some complaints from people saying user agents aren't being recognized since I switched from using Gecko/???????? to Gecko/* as we discussed earlier.
 [2004-02-23 16:46 UTC] php_bug_27291 at garykeith dot com
Sorry I forgot to address your proposal.

You do need IIS to use browscap.dll. The problem is you cannot pass a user agent to it directly like you can with PHP. I have a script (in ASP/VBScript obviously, but probably easily converted to PHP) that lets you pass a specific user agent to browscap.dll and get the resulting browser in return.
 [2004-02-23 17:25 UTC] php-bug-NOSPAM-2004 at ryandesign dot com
Yes, I'm sorry, I meant to mention that changing to 
match Gecko/* instead of Gecko/???????? would seem to 
adversely affect Netscape 7.x, whose UA string starts 
the same way, but ends, after the Gecko/xxx part, with 
Netscape/xxx.

Other things you should look out for in your file: I 
think you may be missing a pattern for the Camino 
browser version 0.7+. That's what you get if you use the 
latest nightly builds, which is effortlessly achieved by 
using the program CaminoKnight, and since 0.7 proper is 
so ancient now, most Camino users probably are running 
the 0.7+ builds. The UA string ends with the + sign but 
seems otherwise the same to what you have in the 2/15/04 
file.

Confirmed that the new browscap.ini from 2/15/04 causes 
a parse error at Apache startup. Filed http://
bugs.php.net/27372

Sorry about my shorthand... by .tgz I just meant a 
compressed file. A Zip file would be fine too. If you 
could bundle up your huge UA list, and possibly also 
your script to feed these to browscap.dll, and put them 
on a web or ftp server like Jay did or just email them 
to me, that'd be great.
 [2004-02-23 21:14 UTC] php_bug_27291 at garykeith dot com
In looking at browscap.ini with all Gecko-based browsers now using Gecko/* instead of Gecko/???????? I can find no reason why Netscape 7.x should have a problem with my definitions. I mean, what's wrong with the following browscap.ini definition?

Mozilla/5.0 (Windows; ?; Windows NT 5.2; *) Gecko/* Netscape7/7.2*

By any reasonable standard it should recognize this user agent as Netscape 7.2. In my tests using browscap.dll that's exactly what it did.

Have I found yet another bug? Or did I completely miss what you were trying to tell me?

Thanks for the tip about Camino. I don't see it very often in my logs. I've added the plus version to my database but haven't published it yet.

I will bundle up all the files you requested and make them available on one of my servers. I'll e-mail you privately once that's done as I do not want to publicize the URL, LOL.
 [2004-02-24 18:43 UTC] jay@php.net
Forgot to assign this to myself. Pretty close to having 
a decent fix based on what I'm seeing in browscap.dll.

J
 [2004-03-15 16:20 UTC] jay@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


This fix should provide the same results as browscap.dll
now.

J
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 30 14:01:28 2024 UTC