|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #20870 htdig search index is bogus
Submitted: 2002-12-06 18:06 UTC Modified: 2003-06-27 11:53 UTC
Avg. Score:3.0 ± 0.0
Reproduced:0 of 1 (0.0%)
From: philip at cornado dot com Assigned:
Status: Wont fix Package: Website problem
PHP Version: 4CVS-2002-12-06 (dev) OS: all
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Bug Type:
From: philip at cornado dot com
New email:
PHP Version: OS:


 [2002-12-06 18:06 UTC] philip at cornado dot com
Here's an example:

Which gives options like:


So for example if you view one of these bogus links:

It shows a page almost identical to:

Except all the links within also link to bogus pages, like, clicking on preface gives us:

There seems to be a pattern here but I can't see it right now.  Another example:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2002-12-07 06:57 UTC]
The redirect is not strange at all. If a page is not found (like installation.apache2 - because install.apache2 can show you the right page), the search feature comes up. It tries to find a page with these keywords in it's name and/or content. The search feature has some bugs, but it's powered by htdig AFAIK, and it's not something we can correct here... The links provided by the search engine are full URLs, and not relative ones, so the problem is not with relative URLs... I have no idea, how to correct this...
 [2002-12-07 13:23 UTC]
Yeah, I chose a bad summary for this, so changed it. 

Regarding the non htdig part, I guess this is just how relative urls work and am unsure how to fix it without problems.  But, for example, going to something like:

Makes relative urls bogus, like a browser looks for:

Sure people shouldn't be adding / into the url but it's something to consider (especially now thet search.php creates them).  Some thoughts for discussion:

* Implement <base href=""> in a sane way to work with mirrors.  This would solve the index.php/foo "issue" afaik.

* Make search.php smarter.  I've been looking at this but am giving up for now :)  I have a hard time believing htdig will ever return a non-existent file like '/manual/en/index.php/cs/de/faq.installation.php' and if it does that's just horrible and search.php should band-aid it.

* Default to returning results from one language making this configurable for users.

 [2002-12-08 04:03 UTC]
The output of search.php contains *absolute* URIs, so a <base> won't help.

The reason we do not add features to the current search page, is that a completely new powerful central search engine is in the planning stage. It will probably be provided by FastSearch (Stig ;). As Stig has the time to set it up for us, it will be used instead of local search engines, and will provide features such as filtering by langauge or location. is reserved for this service.

Well, it will still be nice to solve the problem, but as I see, it is a htdig internal problem...
 [2002-12-13 14:48 UTC]
The search engine part is absolutely up on Stig as far as I see. Also if you look into search.php, you'll see that the results are generated by the local search engine, and not found by PHP itself. That search script may need some development though if the central search site will not be up...
 [2002-12-14 11:28 UTC]
OK, I tried to investigate this "installation.apache2" thing a bit more.. It seems that the error can be triggered by words like "installation" and "configuration", but not by "apache2" or "install" or "configure". It's quite strange. All the results seem OK for the searches except for some innocent words like "installation" and "configuration"...

As I have just revised the source of search.php, I must say that there is no code to do anything special for "installation" and "configuration"... It can also be easily proved that it does not depend on relative URLs, as it returns the same bad URL results, regardless of the starting point you initiated your search from (including subdirectories, and other mirror sites, like for example)...

So there must be some problem in the htdig database... But why for some specific words I don't know...
 [2003-01-02 15:27 UTC]
modifying summary to be more specific
 [2003-01-04 10:49 UTC]
OK, so I have added a simple exclud to search.php to exluce such bogus results. It seems that these results can be triggered with .php/SOMETHING, so there is a path after the .php extension. We have no normal pages getting information through PATH_INFO, so this safely can be removed from results.

This is not a fix for the original problem though, the htdig index should be fixed after all, which I am unable do... Maybe some systems@ guy will volunteer to make it update itself correctly automatically.
 [2003-06-27 11:53 UTC]
htdig isn't used anymore, status->won't fix.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Feb 25 11:01:27 2024 UTC