Last updated: 17 December 1996.

HelpIndex: Index File Format

This paper describes the file format of a help index file, as used by the PHD HelpIndex applet.

This is the obsolete format variant defined for use with HelpIndex version 2.0 or lower, although it is still recognised by later versions. Please refer to the latest format definition.

Three format variants are defined. Code 0 is obsolete, but still recognised. Please use code 1.2, defined here, with HelpIndex 1.6 or later. Code 1.1 is also recognised.

Note: In future, HelpIndex may be enhanced to read other formats, eg the tree hierarchy defined by Apple's HotSauce Meta-Content Format (MCF).

This documentation is Copyright © 1996 PHD Computer Consultants Ltd.


Format Code 1.1

Format code 1.2 is changed from 1.1 as follows. Record type 1 URL records do not have Parent or Icon URL fields. Record type 2 Index Item records do not have a Target field.

So Format code 1.1 does not support a tree hierarchy or per-index target frames.


Format Code 1.2

An index file has a list of index records that the user looks up, together with the URL tree hierarchy. To make the file smaller, short-cuts reference records defining individual URLs. A base URL may be specified.

The index file consists of a series of plain text lines. Each line is a record, split into fields by a semi-colon delimiter. The first field has an integer indicating the record type. Lines are made up of 8 bit ANSI characters, terminated by CR and/or LF.

An index file consists of one line of record type zero (0), followed by zero or more lines of types one (1), two (2) or three(3). Blank lines and lines with errors are ignored, and an error message is sent to the Java Console. All fields are necessary in each record.

The Header record has Format code and Format sub-code fields. If a format is changed by adding more fields or record types - without changing the definition of the older record types - then the sub-code is incremented. For major changes, the Format code is incremented. Older versions of HelpIndex can safely use newer index files as long as the Format code is the same. New versions of HelpIndex so far have been able to read all the old formats.

If there are no Index Item records then there are no keyword indices. If there are no valid Parent fields in the URL records then there is no tree hierarchy.

Comments may be added as extra fields in each record.

Record Type Description
0: Header Defines the index file format, the index description and creation date.
This format has code 1 and sub-code 2.
Field Value Type Field Name
10integerRecord Type
21integerFormat code
32integerFormat sub-code
4stringIndex description
5stringIndex creation date
1: URL Defines a URL for later use in type 2 records, and its position in the tree hierarchy.
Field Value Type Field Name Comments
11integerRecord Type
2>0integerURL number usually this increments from 1 for each type 1 record
3stringURL link usually without a #anchor tag
can be left out
4stringTitle URL title, and/or tree folder name
5integerParent Parent URL number
if missing or -1 then URL is not in tree hierarchy
0 is root level
6stringIcon URL URL of icon in tree
may be present if Parent not -1
2: Index Item
Defines an actual index entry.
Field Value Type Field Name Comments
12integerRecord Type
2stringIndex Something the user searches for
3stringURL title May be a short-cut to a type 1 record, optionally with additional characters.
The first additional character should not be a digit.
4stringURL link May be a short-cut to a type 1 record,
optionally appended with #anchor name.
can be left out
5stringTarget Frame target to override default
can be left out
3 Base URL
Defines the base URL, ie the URL prefix for the following type 1 and 2 records.
Field Value Type Field Name Comments
13integerRecord Type
2stringBase URL HelpIndex puts a / character on the end, if not there

Base URL records

The Base URL record addresses two issues.
First it cuts down the size of the following type 1 and 2 records, by avoiding repetition of a common URL prefix to all pages.
Second, it allows an index to be portable so that the index always - and only - refers to the pages that you want it to refer to. Optionally you may wish to leave out this record, so that the index is relative to its current directory.

Note that if a Type 1 or Type 2 record specifies an absolute URL (eg http://www.you.com/) then the Base URL is not used as a prefix.

Short-cuts

Short-cuts are used to reduce the size of index files.

Index Item records (Type 2) may contain short-cuts in their URL title and URL link fields. A short-cut links to a URL record (Type 1). The short-cut consists of an ampersand character & followed by the type 1 record URL number.

The URL title field may simply contain a short-cut, optionally with additional characters. The first additional character should not be a digit.
The actual URL title is taken from the URL record Title field; this must be present.

The URL link field may simply contain or start with a short-cut. For short-cuts, the URL link is taken from the corresponding type 1 record. Any following #anchorname in the field is appended to the URL link. This allows you to use anchor names within pages easily.

URL Hierarchy

URLs defined in a URL record are shown in the HelpIndex Tree Hierarchy if the Parent field is set appropriately.

If the Parent field is missing or -1 then the URL does not appear in the tree.

One or more URLs must have a Parent of zero indicating that they are at root level. Otherwise Parent must contain the URL number of its parent. The parent must be defined beforehand.

For each URL in the tree, you may specify an Icon URL which must be a GIF image, usually about 15x15 pixels.

If the URL link field is not empty then the record is used to define a tree entry which does not relate to an actual URL.

Example

0;1;2;My Site Index;26 November 1996;       header record

3;http://www.me.com/;                       base url for following records

1;1;hello.html;Hello world;0;root.gif;      define a url as tree root with own icon
2;welcome;&1;&1;_top;                       reference it once 
2;About us;&1;&1#us;;                       reference it again with an anchor name
2;PHD;&1 (About us);&1#us;;                 reference same anchor with a different index

1;2;us.html;More about us;1;;               another URL, child of tree root
2;About us (more);&2;&2;;

2;Sales;sales information;sales.html#top;;

1;3;products.html;Our products;1;;          another URL, child of tree root

1;4;;No linker;1;;                          URL with no link
2;Bloggs;Sales rep;;;                       index with no link

Possible Improvements

Support Unicode, for example by reading Unicode Text Format (UTF).

Version history

Code 1.2a 17 December 1996 Index Item URL link field may be left out
Code 1.2 26 November 1996 Added optional Parent and Icon URL fields to record type 1 (URL).
Added optional Target field to record type 2 (Index Item)
Code 1.1 15 October 1996 Added optional record type 3 (Base URL)
Code 1.0 14 October 1996 Altered the Header record from format code 0

HelpIndex    PHD