php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #20870 htdig search index is bogus
Submitted: 2002-12-06 18:06 UTC Modified: 2003-06-27 11:53 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 1 (0.0%)
From: philip at cornado dot com Assigned:
Status: Wont fix Package: Website problem
PHP Version: 4CVS-2002-12-06 (dev) OS: all
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2002-12-06 18:06 UTC] philip at cornado dot com
Here's an example:

http://www.php.net/installation.apache2

Which gives options like:

/manual/en/index.php/cs/fi/installation.php
/manual/en/index.php/cs/de/faq.installation.php

So for example if you view one of these bogus links:

http://www.php.net/manual/en/index.php/cs/de/faq.installation.php

It shows a page almost identical to:

http://www.php.net/manual/

Except all the links within also link to bogus pages, like, clicking on preface gives us:

http://www.php.net/manual/en/index.php/cs/de/preface.php

There seems to be a pattern here but I can't see it right now.  Another example: php.net/configuration.foo

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-12-07 06:57 UTC] goba@php.net
The redirect is not strange at all. If a page is not found (like installation.apache2 - because install.apache2 can show you the right page), the search feature comes up. It tries to find a page with these keywords in it's name and/or content. The search feature has some bugs, but it's powered by htdig AFAIK, and it's not something we can correct here... The links provided by the search engine are full URLs, and not relative ones, so the problem is not with relative URLs... I have no idea, how to correct this...
 [2002-12-07 13:23 UTC] philip@php.net
Yeah, I chose a bad summary for this, so changed it. 

Regarding the non htdig part, I guess this is just how relative urls work and am unsure how to fix it without problems.  But, for example, going to something like:

http://www.php.net/downloads.php/foo
http://www.php.net/downloads.php/

Makes relative urls bogus, like a browser looks for:

http://www.php.net/downloads.php/anoncvs.php

Sure people shouldn't be adding / into the url but it's something to consider (especially now thet search.php creates them).  Some thoughts for discussion:

* Implement <base href=""> in a sane way to work with mirrors.  This would solve the index.php/foo "issue" afaik.

* Make search.php smarter.  I've been looking at this but am giving up for now :)  I have a hard time believing htdig will ever return a non-existent file like '/manual/en/index.php/cs/de/faq.installation.php' and if it does that's just horrible and search.php should band-aid it.

* Default to returning results from one language making this configurable for users.


 [2002-12-08 04:03 UTC] goba@php.net
The output of search.php contains *absolute* URIs, so a <base> won't help.

The reason we do not add features to the current search page, is that a completely new powerful central search engine is in the planning stage. It will probably be provided by FastSearch (Stig ;). As Stig has the time to set it up for us, it will be used instead of local search engines, and will provide features such as filtering by langauge or location. search.php.net is reserved for this service.

Well, it will still be nice to solve the problem, but as I see, it is a htdig internal problem...
 [2002-12-13 14:48 UTC] goba@php.net
The search engine part is absolutely up on Stig as far as I see. Also if you look into search.php, you'll see that the results are generated by the local search engine, and not found by PHP itself. That search script may need some development though if the central search site will not be up...
 [2002-12-14 11:28 UTC] goba@php.net
OK, I tried to investigate this "installation.apache2" thing a bit more.. It seems that the error can be triggered by words like "installation" and "configuration", but not by "apache2" or "install" or "configure". It's quite strange. All the results seem OK for the searches except for some innocent words like "installation" and "configuration"...

As I have just revised the source of search.php, I must say that there is no code to do anything special for "installation" and "configuration"... It can also be easily proved that it does not depend on relative URLs, as it returns the same bad URL results, regardless of the starting point you initiated your search from (including subdirectories, and other mirror sites, like at.php.net for example)...

So there must be some problem in the htdig database... But why for some specific words I don't know...
 [2003-01-02 15:27 UTC] goba@php.net
modifying summary to be more specific
 [2003-01-04 10:49 UTC] goba@php.net
OK, so I have added a simple exclud to search.php to exluce such bogus results. It seems that these results can be triggered with .php/SOMETHING, so there is a path after the .php extension. We have no normal pages getting information through PATH_INFO, so this safely can be removed from results.

This is not a fix for the original problem though, the htdig index should be fixed after all, which I am unable do... Maybe some systems@ guy will volunteer to make it update itself correctly automatically.
 [2003-06-27 11:53 UTC] philip@php.net
htdig isn't used anymore, status->won't fix.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 11:01:28 2024 UTC