Hi HelpIndex: Index File Format

Last updated: 2 October 2002.

Page Contents Format Code 2.1 Coding 1, Revision 1  Character Encoding  Record Definitions  Base URL records  Supporting Short-cuts  Contents tree  Examples  Possible Improvements  Version history 
See also character encodings  Code 0  Codes 1.x 

This paper describes the file format of a help index file, as used by the PHD Hi HelpIndex applet.

Four format variants are defined. Code 0 and Codes 1.x are obsolete, but still recognised. Please use code 2.1, defined here, with Hi HelpIndex 2.1 or later.

Code 2.1 Revision 1, 6 October 1997, includes Page Tip (type 6) records and is used by Hi HelpIndex versions 2.1.10 and 2.2.10 or later.

Code 2.1 Revision 2, 2 October 2002, includes Include file (type 8) records and is used by Hi HelpIndex versions 2.1.19 or later. Include file (type 8) records are NOT currently supported by Hi Lab.

This documentation is Copyright © 1996, 1997, 2002 PHD Computer Consultants Ltd.

Format Code 2.1 Coding 1, Revision 2

An index file is a list of records which contain the URL and index list. To make the file smaller, short-cuts reference records defining individual URLs. BaseURL records can also be used to make an index file shorter.

An index file has a series of character strings, one string per line. Lines usually end in CR and/or LF.

Each line is a record, split into fields by a semi-colon delimiter. The first field has an integer indicating the record type.

The first line must be in 8 bit characters terminated by CR or LF. It always starts with the two characters "HI". In general, subsequent lines need not be in 8 bit characters.

An index file consists of one Header record line (type 0) then a Description record line (type 1), followed by zero or more of these record lines: Base URL (type 3), URL (type 5), Page Tip (type 6), Index Item (type 7) and Include file (type 8) Blank lines and lines with errors are ignored. Error messages are sent to the Java error output or console.

The Header record has Format code and Format sub-code fields. If a format is changed by adding more fields or record types - without changing the definition of the older record types - then the sub-code is incremented. For major changes, the Format code is incremented. Older versions of Hi HelpIndex can safely use newer index files as long as the Format code is the same. New versions of Hi HelpIndex so far have been able to read all the old formats.

The Header record includes a Coding Number field. Currently only one coding is defined (1) which indicates that subsequent lines are made of 8 bit characters terminated by CR or LF.

If there are no Index Item records then there are no keywords. If there are no valid Parent fields in the URL records then there is no Contents.

Comments may be added as extra fields in each record.

Character Encoding

Version 2.2+ of Hi HelpIndex supports different character encodings for index files. See the usage instructions for details of how to specify the character encoding.

In a different character encoding, use the same index file layout defined above (ie semi-colon separated strings in CR or LF terminated lines) with the characters in your chosen character encoding.

This means that you can use Unicode or Unicode's UCS Transformation Format (UTF8) (RFC 2044). Here is the full list of supported encodings

Note carefully that Hi Lab does not create index files in different characters encodings so you must make your own.

Format Code 2.1 Coding 1 Record Definitions

Annotations in Value column
> 0 Integer value must be greater than zero
+ Field may be blank
* Field may contain a short-cut to the URL Link field in a URL record.
** Field may contain a short-cut to the Title field in a URL record.
Record Type Description
0: Header Defines the index file format and coding.
This format has code 2 and sub-code 1. The coding is 1.
Field Value Type Field Name
1"HI0"stringRecord Type
22integerFormat code
31integerFormat sub-code
41integerCoding number
1: Description Defines the index description and creation date.
Field Value Type Field Name
11integerRecord Type
2stringIndex description
3stringIndex creation date
3 Base URL Defines a base URL, ie the URL prefix for URL records.
Field Value Type Field Name Comments
13integerRecord Type
2> 0integerBase URL number This usually increments from 1 for each Base URL record
3stringBase URL Hi HelpIndex puts a / character on the end, if not there
4+stringTarget Frame target to override default
5: URL Defines a URL and its position in the Contents tree hierarchy.
Field Value Type Field Name Comments
15integerRecord Type
2> 0integerURL number usually this increments from 1 for each URL record
3+integerBase URL number
4+stringURL link usually without a #anchor tag
5stringTitle URL title, and/or Contents folder name
6+stringTarget Frame target to override default or Base URL Target
7+integerParent Parent URL number
if missing or -1 then URL is not in Contents
0 is root level
8+stringIcon URL URL of icon in Contents relative to the applet document base.
No Base URL added.
6: Page Tip Defines a page tip for a URL or the index file.
Field Value Type Field Name
16integerRecord Type
2integerURL number Or zero to indicate Tip is for index file.
3C style stringTip The page tip string that is displayed in a pop-up box when the mouse hovers over a URL in the Contents (or when space is pressed).
This string may contain the two characters "\n" to include a line break or the four characters "\160" to include a non-breaking space. Note that "\\" must be used for a "\" backslash character and "\59" for a semi-colon.
7: Index Item Defines an Index entry.
Field Value Type Field Name Comments
17integerRecord Type
2**stringIndex Something the user searches for
3+ *stringURL link
4**stringURL title
5+stringTarget Frame target to override default or URL Target
8: Include file Specifies an index file to include.
This record type is not currently supported by Hi Lab.
Field Value Type Field Name Comments
18integerRecord Type
2stringfile URL The URL of the .hi file to include
3integerParent Parent URL number, ie where index file is inserted into contents
0 is root level
4+stringDescription The description for the included file root node.
If blank, then the included file's Index description is used.
If "noroot" then the included file root is not included, though all child URLs are.

Base URL records

If a Base URL number is given in a URL record then the corresponding Base URL record's Base URL field is used as a prefix to the URL's URL link. Note that the Icon URL field is not so prefixed.

The Base URL record addresses two issues.
First it cuts down the size of URL records, by avoiding repetition of a common URL prefix.
Second, it allows an index to be portable so that the index always - and only - refers to the pages that you want it to refer to. Optionally you may wish to leave out this record (or records), so that the index is relative to its current directory.

Note that if a URL record specifies an absolute URL (eg http://www.you.com/) then the Base URL is not used as a prefix even if a Base URL number is given.

Short-cuts

Short-cuts are used to reduce the size of index files.

Index Item records may contain short-cuts in their Index, URL link and URL title fields. A short-cut links to a URL record. The short-cut consists of an ampersand character & followed by the URL record URL number.

The Index and URL title fields may contain or start with a short-cut. The short-cut is replaced with the corresponding URL record Title field.

The URL link field may contain or start with a short-cut. The short-cut is replaced with the URL record URL link field. For example, this allows you to use anchor names within pages easily (by appending the #anchorname directly to a short-cut, eg "&2#top").

Contents tree

URLs defined in URL records are shown in the Hi HelpIndex Contents tree if the Parent field is set appropriately.

If the Parent field is empty or -1 then the URL does not appear in the tree.

One or more URLs must have a Parent of zero indicating that they are at root level. Otherwise Parent must contain the URL number of its parent. The parent must be defined beforehand.

For each URL in the tree, you may specify an Icon URL which must be a GIF image, usually about 15x15 pixels. Note that this is relative to the applet's document base (ie no Base URL is prefixed).

If the URL link field is empty then the record is used to define a Contents entry which does not relate to an actual URL.

Example

HI0;2;1;1;                                  header record
1;My Site Index;13 March 1997;              description record
6;0;This is what we've got...\n\nEnjoy		index page tip

3;1;http://www.me.com/;;                    base url 1
3;2;http://www.you.com/;Main;               base url 2 with a Main Target

5;1;1;hello.html;Hello world;;0;root.gif;   define a url as tree root with own icon
6;1;Page tip for the above URL
7;&1;&1;&1;_top;                            reference it once 
7;About us;&1#us;&1;;                       reference it again with an anchor name
7;PHD;&1#us;&1 (About us);;                 reference same anchor with a different index

5;2;1;us.html;More about us;Main;1;;        another URL, child of first URL with Main Target
7;About us (more);&2;&2;;

7;Sales;sales.html#top;Sales information;;  index without short-cut

5;3;1;products.html;Our products;;1;;       another URL, child of first URL

5;4;;;No linker;;1;;                        URL with no link

5;5;2;test.htm;Test page;;1;;               URL off second Base URL

Database Example

In this example, Index Item records are used to create a tiny database in the Index tab card. For database entries where there is no URL to look up, the URL link field is empty.

URL records have been used to show an equivalent Contents tab card for email, and to provide short-cuts for the Index Item records.

The trailing fields have been missed off if they are empty.

HI0;2;1;1;
1;Database example;Dec 17 1996;

5;1;;;PHD;;0

5;2;;mailto:sales@phdcc.com;Chris Cant;;1
7;&2;;Director
7;&2;;PHD Computer Consultants Ltd
7;&2;&2;Email: sales@phdcc.com
7;&2;http://www.phdcc.com/;Web: www.phdcc.com;_blank

5;3;;;John Cant;;1
7;&3;;Director
7;&3;;PHD Computer Consultants Ltd
7;&3;;France
7;&3;http://www.phdcc.com/;Web: www.phdcc.com

Possible Improvements

Hi HelpIndex could be enhanced to read other formats, eg Microsoft's HTML Help Workshop (hhk/hhi), or the tree hierarchy defined by Apple's HotSauce Meta-Content Format (MCF).

Version history

Code 2.1, Coding 1, Revision 2 2 October 2002 Include file records added
Code 2.1, Coding 1, Revision 1 6 October 1997 Page tip records added
Code 2.1, Coding 1 26 February 1997 Complete re-arrangement
Code 1.2a 17 December 1996 Index Item URL link field may be left out
Code 1.2 26 November 1996 Added optional Parent and Icon URL fields to record type 1 (URL).
Added optional Target field to record type 2 (Index Item)
Code 1.1 15 October 1996 Added optional record type 3 (Base URL)
Code 1.0 14 October 1996 Altered the Header record from format code 0

HelpIndex    PHD