Make Help Index

This page documents MakeHelpIndex version 1.5.0 (evaluation version 1.4.2)
Last updated: 31 January 1998.

Page Contents Introduction  Downloading, Installation and Usage  Algorithm  Tags recognised  Tips on parsing APIs  Frames and the Skip Target  Possible Improvements  Bug List  Version history 
See also HelpIndex file format 

MakeHelpIndex and its component classes are Copyright © 1996-2002 PHD Computer Consultants Ltd.
Please read the licence instructions for usage restrictions.

Introduction

MakeHelpIndex is a Java application which constructs a help index file for use with PHD HelpIndex.

Most people will use the Windows program Hi Lab as it allows you to edit the index file. MakeHelpIndex does not allow edits.
Hi Lab is the recommended index scanner and editor. MakeHelpIndex will not be updated.

MakeHelpIndex recursively analyses your web pages and builds a text file index of web pages, anchor names within pages, and keywords. Your HTML must be syntactically correct.

For large sites, MakeHelpIndex can take some time to run, as Java is quite slow.

MakeHelpIndex often picks up errors in your links. Fix them up...

MakeHelpIndex does not index all the words in your pages.

Downloading, Installation and Usage

For an alternative description of this process, read the getting started guide.

MakeHelpIndex is a Java application. You will need a Java runtime (Java Virtual Machine) on your computer to run it. You cannot run it from a Web browser.

The usual way to get a JVM is to download and install Sun JavaSoft's Java Runtime Environment or the Java Development Kit (JDK). Windows Internet Explorer 3.0+ users may run Microsoft's Win32 JVM; in this case replace the command java below with jview.
We understand that a Mac OS X JVM is available from Apple.

Go to the evaluation download page and download hi230bas.zip which has all the class files and this documentation.

Alternatively download the development kit HelpIndex.zip and expand to get the classes into a directory on your CLASSPATH, remembering to preserve the case of the filename letters.

Run MakeHelpIndex as follows:
  java MakeHelpIndex -options [index_description] [skip_target] output_file input(s)
where the input(s) are either URLs or file names (selected by -f),
and the output_file is a local file name.
Note that if the input(s) are file names then they may use either \ or / as the directory delimiter.

Option Description
-f With -f specified, the input(s) are local file names, relative to the current directory.
-c With -c specified, the page names are case significant.
Otherwise, TEST.HTML and test.html are assumed to be same file.
-b With -b specified, blank anchors are not ignored.
-d With -d specified, the next parameter is used as the Index Description. If the description has spaces then remember to enclose it in quotes.
If not specified, the Index Description is blank.
-s With -s specified, the next parameter the "skip target".
Normally an Index Item sets its Target field as appropriate. However if the target is the "skip target" then it is not set.
If the skip target has spaces then remember to enclose it in quotes.
If not specified, no targets are skipped.
-t By default, if a page has no title then an Index Item record is generated using the page name.
With -t specified, if a page has no title then an index is not generated and links are not followed. We found this useful to avoid tiny pages and Contents pages being parsed.
-a By default, only pages which have extension .htm or .html are parsed.
With -a specified, all URLs are parse unless they are absolute or have .gif or .jpg extensions.

Note that the output file conventionally has a ".hi" extension. If the .hi extension causes problems with your server, then use something which is acceptable, eg .txt.

For example:
  java MakeHelpIndex -fd "PHD Site Index" index.hi index.html
or
  java MakeHelpIndex -c index.hi http://www.phdcc.com/index.html

As described below, MakeHelpIndex usually follows all the links in your HTML, so there is no need to list all your web pages as the input files. Instead just specify your root home page, and all pages will follow in its wake.

If you have given an absolute URL, then the keywords will always link to the specified pages. You should be able to make the keywords relative simply by cutting out the Base URL line(s) in the index file.
It is usually better to run MakeHelpIndex on your own local files from their home directory, as this will make the index relative.

Algorithm

MakeHelpIndex evaluation version 1.4.2 writes index file format 1.2, while the full version writes format 2.1 coding 1.

MakeHelpIndex follows links (a) in A HREF tags, (b) in FRAME SRC tags and (c) in any other tags with an HREF attribute. Note that repeated file links are only parsed once. Link names are case-insensitive (unless -c is specified) so two separate links to TEST.HTML and test.html are assumed to mean the same file. If -t is specified and the page has no title, then links are not followed.

Only non-absolute links are followed. If a link begins with letters (Aa..Zz) and a colon (eg "http:") then the link is not followed.

Links are "normalised" when they contain "subdir/../", ie that string is removed.

Unless -a is specified, only link names with extensions ".htm" and ".html" are followed. So, pure directory names will not be followed, as you are expecting the Web server to pick up the default page, eg index.html or default.htm.

MakeHelpIndex starts by writing a single Header record. For index file format 2, a Description record is then written.

For each command line input page, a Base URL record is written. Following URL records are shortened appropriately. For example, input URL "http://www.me.com/products/index.html" has a base URL "http://www.me.com/products/" written.
Version 1.5.0: The Target field is always empty.
Version 1.4.2: only writes a record if there is a base.

MakeHelpIndex writes a URL record for each page with a title. The title is all the plain text within the TITLE tags, ie excluding other tags.

MakeHelpIndex writes an Index Item record once for each page. If there is a title, then a short-cut to the corresponding URL record is written. Otherwise the Index, etc. are the page name (unless -t specified).

For each anchor name in the page, an Index Item record is written. The Index is all the plain text within the A tags, ie excluding other tags. If there is no plain text then the anchor is ignored (unless the -b option is specified). URL short-cuts are used if possible. The anchor name is appended to the link.
Eg In URL 3 in format 2,
<A NAME="params"><B>Parameters</B></A>
writes an Index Item record
7;Parameters;&3#params;&3;;

MakeHelpIndex keeps track of the frame the current URL is destined for, eg the TARGET in A HREF tags and the NAME in FRAME SRC tags. Unless this target is the "skip target" (see below) it is written as the Target in each Index Item record.

If an anchor name has a (new) KEYWORDS attribute then an Index Item record is written for each keyword. The keyword is the Index. URL short-cuts are used if possible. The page title (short-cut) is appended with the A tag's plain text in brackets.

For each keyword mentioned in a META NAME=keywords CONTENT tag, an Index Item record is written. The keyword is the Index. URL short-cuts are used if possible.

MakeHelpIndex writes semi-colon, carriage return and line feed characters (";\r\n") at the end of each record line. The semi-colon is there so you can add comments easily.

MakeHelpIndex strips any semi-colons from titles, anchor keywords as a semi-colon is the help index file field delimiter. Anchor names with semi-colons are ignored.

Tags Recognised

HTML Parameter
Title <TITLE> xxx </TITLE> xxx is the page title
Anchor link <A HREF=xxx TARGET=yyy> .. </A> xxx is a possible link to follow
yyy is target frame
Anchor name
(optional keywords)
<A NAME=xxx KEYWORDS=kkk> yyy </A> xxx is the anchor name
kkk is a comma separated list of keywords
yyy is the index
Frame <FRAME SRC=xxx NAME=yyy> xxx is a possible link to follow
yyy is target frame
Page Keywords <META NAME=keywords CONTENT=kkk> kkk is a comma separated list of keywords
Any other links <..AnyTag.. HREF=xxx TARGET=yyy> xxx is a possible link to follow
yyy is target frame
The last case is a catch-all, which finds <AREA HREF=xxx ..> tags for example. This makes MakeHelpIndex "future-proof" provided the HREF and TARGET attributes are used.

If the TARGET or NAME attributes are missing then the prevailing target is used.

Frames and the Skip Target

If your pages use frames then you may want to use the -s option to specify a "skip target".

Taking the PHD site as an example, there is a small "Contents" frame and a larger "Main" frame. The HTML to call HelpIndex specifies a target parameter of "Main", so pages will be displayed in "Main" by default.

By default MakeHelpIndex will include a target for each index (if it exists). A lot of these will be "Main" which is unnecessary.

If you specify a "skip target" and it matches a target then the target is not written out. This makes your index file smaller.

For example, for the PHD site this call is used to build the help index
  java MakeHelpIndex -tfd "PHD Site Index" -s Main index.hi index.html

Tips on parsing APIs

One use of HelpIndex is to provide easy access to all the Fields and Methods of a Java API. This is usually generated by javadoc from the source code comments.

MakeHelpIndex can scan a whole API documentation set. Usually just start off from the AllNames.html page, eg
  java MakeHelpIndex -fd "JDK API" api.hi AllNames.html

Often, errors in the links are shown up! Amend the source code comments.

The quality of the help index depends on the API documentation. In particular, the anchor names must be set up properly so that all the API fields and methods are picked up. Ideally an anchor name should look like <A NAME="get">get(int)</A>, ie where the index is between the A tags.

Sometimes class constructors are not put in such anchor names. In other documentation, there is no text between the A tags; in this case, give MakeHelpIndex the -b option to allow keywords to be built from such anchor names. Normally blank anchor names are filtered out.

I am not sure how to persuade javadoc to follow these guidelines.

Possible Improvements

Which of these improvements do you want? Any other suggestions?
PS See the HelpIndex possible improvements and the Index file format possible improvements.

Check what files, etc., are required to install a JVM, ie without downloading the whole JDK from Sun.

Bug List

Version history

1.5.0 3 March 1997 Writes format code 2.1 coding 1, in tandem with Hi Lab.
1.4.2 20 March 1997 Parses <AnyTag HREF=xxx TARGET=yyy>
1.4.1 6 February 1997 -a option added to parse most URLs.
1.4.0 30 December 1996 Changed parse order to write URL Parent field.
Icon URL fields still always blank.
Doesn't run out of memory so easily.
1.3.1 16 December 1996 Writes ";" instead of " ;" for blank fields, as HelpIndex can now cope.
1.3.0 29 November 1996 Writes format code 1.2, ie recognises and stores target frames.
NB URL Parent and Icon URL fields indicate no tree.
-s option added to specify "skip target".
-t option added to stop an index being generated if a page has no title.
Writes semi-colon delimiter at end of each line to permit comments easily.
META tag: NAME attribute can be after CONTENT attribute.
Prints URL and Index Item record count and limit warnings.
1.2.1 16 October 1996 Writes format code 1.1
1.2.0 14 October 1996 Writes format code 1.0
Has new -d option for index description.
1.1.3 For bad links, the name of the file with the bad link is printed.
1.1.2 Bad URL or file name exception: error written to System.err.
1.1.1 Replace -u with -f option.

HelpIndex    PHD