|
Using RDF/XML to set meta fields descriptions
Introduction
The Resource Description Framework (RDF)
is a language for representing information about resources in the World Wide Web.
The FindinSite-CD Findex indexer
can index RDF/XML files to find meta field information.
Words defined in RDF/XML are only stored in the search database fields list;
they are not stored in the main word list.
The FindinSite-CD runtime can search the field information in the normal way -
as described here.
RDF support is provided as a parser Java plug-in
for Findex. The parser is called phdccRDF and is supplied
in file phdccRDF.jar.
There is NO support for RDF in the FindinSite-CD-Wizard Windows
indexer and set up tool.
phdccRDF will not run under the Microsoft VM.
RDF Support
phdccRDF only supports a limited subset of RDF/XML.
Please get in touch if want this support extended.
The following example gives a concise summary of the supported RDF/XML.
Green text is constants that phdccRDF looks for.
Blue text is page or field information picked up by phdccRDF.
<rdf:rdf>
<rdf:description rdf:about=URI>
<ANY_TAG>CHARS</ANY_TAG>
<ANY_TAG>
<rdf:bag> or <rdf:seq> or <rdf:alt>
<rdf:li>CHARS</rdf:li>
<rdf:li>CHARS</rdf:li>
</rdf:bag> or </rdf:seq> or </rdf:alt>
</ANY_TAG>
<ANY_TAG>
<rdf:value>CHARS</rdf:value>
</ANY_TAG>
</rdf:description>
</rdf:rdf>
Multiple rdf:description and ANY_TAG tags
are supported (but not nested within themselves).
No other tags or attributes are recognised.
Not supported: rdf:ID, abbreviated types, collections, schemas, attributes setting value
phdccRDF stores field information for file URI
and sets fields ANY_TAG to CHARS.
If the URI starts with your indexing Base URL, then this is removed
from the start of the URI. Other absolute values for
URI are not accepted.
Example RDF using Dublin Core
Dublin Core is a minimal set of
descriptive elements that facilitate the description and the automated indexing of
document-like networked objects, in a manner similar to a library card catalog.
Dublin Core is ideally suited for describing web resources in RDF/XML format.
(Dublin Core fields can also be defined in HTML META fields -
see here.)
Here is an example Dublin Core RDF/XML file:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://www.example.org/crocs.html">
<dc:title>Research into Crocodile eating habits</dc:title>
<dc:creator>Chris Cross</dc:creator>
<dc:description>A study of how and when crocodiles eat</dc:description>
<dc:language>en</dc:language>
<dc:format>text/html</dc:format>
<dc:subject>
<rdf:Bag>
<rdf:li>Crocodiles</rdf:li>
<rdf:li>carnivore habits</rdf:li>
<rdf:li>Research summary</rdf:li>
</rdf:Bag>
</dc:subject>
</rdf:Description>
</rdf:RDF>
|
Here is the information found by phdccRDF:
| File http://www.example.org/crocs.html |
| Field name | Field value |
| dc:title |
Research into Crocodile eating habits |
| dc:creator |
Chris Cross |
| dc:description |
A study of how and when crocodiles eat |
| dc:language |
en |
| dc:format |
text/html |
| dc:subject |
Crocodiles |
| dc:subject |
carnivore habits |
| dc:subject |
Research summary |
How to use phdccRDF
phdccRDF uses the standard Java SAX API to inspect the RDF/XML files.
The standard Sun Java VM includes a SAX implementation (eg Crimson).
Run Findex and phdccRDF together as follows:
java -cp Findex.jar;phdccRDF.jar com.phdcc.findex.Findex @index.properties
The index instructions file index.properties should contain the usual
Findex properties, eg SaveAsPathname,
ScanType,
ScanDirectory and
ScanDirLevels.
Add in a ParserN property
to add the phdccRDF plug-in into Findex. For example, use the following:
parser1=RDF;*.rdf;com.phdcc.findex.rdf.ParseRDF;false;false
ParseRDF=yes
RDF_Files=*.rdf
|