|
Indexing parser plug-ins
Introduction
If FindinSite-CD does not support a file type then you can write a plug-in in Java to index it.
Plug-ins can only be added to the Findex
Java indexer, and cannot be added to FindinSite-CD-Wizard.
You must also make sure that your users will be able to view files of your new file type,
ie they must have a suitable viewer available or provided on CD.
Java programmer information: A plug-in must implement the
fisParser interface.
The main function parse must parse an InputStream
and return any found information using its caller's
ParseHandler interface.
Any number of plug-ins can be added into Findex by adding
ParserN properties
to the index instructions file.
Additional plug-in specific properties can also be added to this file.
FindinSite-CD comes with an example plug-in, phdccRDF for indexing RDF/XML files.
This plug-in is slightly unusual because RDF files only contain meta-data
information about other files. However the techniques shown in the source
are exactly the same as would be used by a normal parser.
Using a plug-in
To use a plug-in, you must:
- tell Findex about it in the index instructions file
- add the plug-in runtime (.jar file) to the CLASSPATH when running Findex
Findex ParseN properties
To tell Findex to use a plug-in, add a
ParserN property
to the index instructions file. Any number of parsers can be added;
the first must be called Parser1,
the second Parser2, etc.
The value for each ParserN property must contain these fields,
separated by semi-colons:
| Field |
Example |
| File Type Short Name |
RDF |
| Default file-spec (comma-separated) |
*.rdf,*.xml |
| fisParser implementing class |
com.phdcc.findex.rdf.ParseRDF |
| AddPage |
false |
| TranslateEscapeSequences |
false |
For each ParserN property you can also specify additional properties
as follows:
| Property |
Example |
Description |
Type |
| ParseName |
ParseRDF=true |
Whether indexing is enabled |
Boolean |
| Name_Files |
RDF_Files=*.rdf |
Actual file-spec desired |
Comma-separated file-specs |
You can also add your own custom properties.
Provide a semi-colon separated list of these in a
ParserNparams property. For each parameter you can set
a default value after a colon. The actual property value can be any string
value (though there is special support for retrieving boolean values).
This is an example set of properties added to a Findex index instructions file.
Two additional parameters are supported; the actual properties override the
null and false default values.
parser1=XYZ;*.rdf;com.phdcc.findex.xyz.ParseXYZ;true;false
parser1params=XYZ_Passwords;XYZ_ReportErrors:false
ParseXYZ=yes
XYZ_Files=*.xyz
XYZ_Passwords=secret,shhhh
XYZ_ReportErrors=true
Plug-in runtimes
You need to add each plug-in runtime to the Java CLASSPATH.
You can do this at runtime by adding the plug-in .jar file
to the java command CLASSPATH, eg to add the phdccRDF
runtime, add phdccRDF.jar to the -cp CLASSPATH
(after a semi-colon). Note that you may need to use a full path
to the .jar file.
java -cp Findex.jar;phdccRDF.jar com.phdcc.findex.Findex @index.properties
Programming a Plug-in
You need to be a competant Java programmer to write a Findex file indexer plug-in.
You will obviously also need to know how to extract the required information
from your files.
There is source code for the required interfaces in the plugin
sub-directory of the Windows installation directory, eg:
C:\Program files\PHD\fisCD\plugin\
You will probably need to create a suitable directory
structure if you are going to develop with these files.
fisParser.java, fisParser.class: |
fisParser interface source |
fisParserInformation.java, fisParserInformation.class: |
fisParserInformation interface source |
ParseHandler.java, ParseHandler.class: |
ParseHandler interface source |
parseRDF.java: |
RDF example plug-in class source |
The parseRDF.java example is compiled and assembled correctly into
phdccRDF.jar, found in the main installation directory.
|
fisParser interface
Your plug-in must have a class that implements the com.phdcc.findex.fisParser
interface, given below.
A single instance of your class is created when Findex starts.
Your parse() routine is called once to parse each file.
If appropriate, use the com.phdcc.findex.fisParserInformation interface
(if info not null)
to retrieve any values set for your custom parameters.
Then, as you inspect the InputStream,
report any found information to Findex using the
com.phdcc.findex.ParseHandler callback.
Finally, return the number of bytes indexed,
ie the total number of bytes in the file.
Your GetPosition() method may be called at any time to get some information
about where you are in the file for error reporting purposes.
For example, you could return "line 123" or "page 5".
Return null if there is no such information,
or if you are not processing a parse() call.
The SetCharset() method is used by Findex to tell its HTML parser
what character set to use, based on the page's META content-type.
package com.phdcc.findex;
import java.io.*;
public interface fisParser
{
public int parse( InputStream is, ParseHandler callback, fisParserInformation info);
public String GetPosition();
public void SetCharset( int charset);
}
|
fisParserInformation interface
Use the com.phdcc.findex.fisParserInformation interface
to find any values set for your custom parameters. Any user-supplied
values will override the default values you gave in the
ParserNparams property.
Either use GetParam() or GetBooleanParam() to get
a named parameter. If the parameter has not been set by the user
and has no default then null is returned; in this case,
GetBooleanParam() returns false.
package com.phdcc.findex;
public interface fisParserInformation
{
public String GetParam(String name);
public boolean GetBooleanParam(String name);
}
|
ParseHandler interface
Your plug-in passes any found information back to Findex through the
com.phdcc.findex.ParseHandler interface, below.
This interface is designed to accept HTML-like information.
Full information on the interface will be prepared soon.
package com.phdcc.findex;
import java.io.*;
public interface ParseHandler
{
public void SetPage( String URL, String target);
public int Tag( String tag);
public void TagEnd();
public int Attribute( String name, String text);
public int TaggedText( String text);
public int PlainText( String text);
public void ReportError(String Msg);
public PrintStream getErrorStream();
public void ShowProgress();
public boolean AbortNow();
}
|
RDF Plug-in Example
The example plug-in phdccRDF is a single class ParseRDF.java
in package com.phdcc.findex.rdf. It inspects RDF/XML to
find meta-data field information about other files.
As required, ParseRDF implements fisParser.
However it can also be run from the command line for testing purposes,
and so has a static main() method. You can pass the name of an
RDF file as the first command line parameter. For the testing to work,
ParseRDF also implements ParseHandler
so it can print out what it is reporting.
ParseRDF uses Java SAX technology to analyse XML files.
SAX is provided in the standard Sun Java VM distribution.
|
|