php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #20966 [CHM] Search index includes irrelevant words
Submitted: 2002-12-12 11:28 UTC Modified: 2005-04-30 11:25 UTC
Votes:3
Avg. Score:2.3 ± 0.9
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:2 (100.0%)
From: steve at holdenweb dot com Assigned: goba (profile)
Status: Wont fix Package: Documentation problem
PHP Version: 4.3.0RC3 OS: Windows, any
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2002-12-12 11:28 UTC] steve at holdenweb dot com
The CHM file is somewhat over-indexed. For example, when I search for "payment", among the relevant hits I also see the "curl_version" page because the "next page" link in the footer is entitled "Cybercash *payment* options". This seems redundant and likely to render searches less useful than they would otherwise be. Can't find any other reports of this, hence the new bug report.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-12-13 01:36 UTC] goba@php.net
Well the, search index is created by the CHM compiler after it compiled in the HTML files. So there is no method to index the pages without the navigation parts and then replace the files with the full ones.

I even don't know any method to put some HTML parts into some "ignored" section in the HTML files, so I really don't know how to fix this. I'll try to look up some info, but I hardly beleive I'll find something useful. Microsoft HH compiler is not too smart...
 [2005-04-07 10:04 UTC] richard dot quadling at bandvulc dot co dot uk
I think the only way this could be handled is to have the navigation links generated in JavaScript rather than in plain HTML. This way it would not be searched by the full text searching of MS CHM Viewer.

The only "normal" way to have things NOT included in the full text search is to produce a word stop list (limited to 512 bytes! Gee! They really know how to be generous at MS!!!), but this is then across the entire manual - not much use.

There does not seem to be any tag you can put around text to make it NOT indexable (which would be nice).

This would probably have to be handled by the htmlhelp/make_chm.php script(s).

Thinking out loud, what happens if the text is not ascii? Use the %xx code or &#xxx; code. Seems daft though doesn't it? Taking "abc" and converting it into something just so it doesn't get searched?

The JS solution would fix the problem, but changes to the CHM build process would also know onto the skins too.

But, if this is a really requirement, how about introducing a docbook element specifically dealing with text that is NOT searchable and then because it is in the system from the ground up, those system which can support it can! So, the CHM build process could take the XML and create a specific div class which can be looked for and then converted into a simple bunch of JavaScript's document.writeln()'s.

This would also allow documentors to exclude parts of the documentation from the CHM full text search. (Not sure why - but a perfect example is the search including navigation links! <grin />).

Richard.
 [2005-04-07 16:19 UTC] steve at holdenweb dot com
[Changing my logged email address]
 [2005-04-27 20:16 UTC] vrana@php.net
I suggest to not fix that. Writing some parts of CHM by JS would probably introduce another errors.
 [2005-04-30 11:25 UTC] techtonik@php.net
Theoretically it is possible without JS. We just need to rebuild CHM full-text search database and repack it manually. =) I think it is a worthy challenge, but I doubt smb. ever find a time to do take part. There can be ready pieces of code to start with from the links in the following doc.

More info here:
http://www.nongnu.org/chmspec/latest/Internal.html#FIftiMain
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Wed May 22 18:01:28 2019 UTC