FindinSite-CD: Search engine for CD/DVD   .
 
Powered by FindinSite-MS
. Home | Examples | Starting | Set up | Advanced | Languages | Purchasing | Email .
. .
  Getting started | FAQ | FindinSite-CD-Wizard | Findex | File Types | PDF | RDF | Parser API | HTML CDs/DVDs

 

Using RDF/XML to set meta fields descriptions


Introduction

The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web.

The FindinSite-CD Findex indexer can index RDF/XML files to find meta field information.  Words defined in RDF/XML are only stored in the search database fields list; they are not stored in the main word list.  The FindinSite-CD runtime can search the field information in the normal way - as described here.

RDF support is provided as a parser Java plug-in for Findex.  The parser is called phdccRDF and is supplied in file phdccRDF.jar.  There is NO support for RDF in the FindinSite-CD-Wizard Windows indexer and set up tool.  phdccRDF will not run under the Microsoft VM.

RDF Support

phdccRDF only supports a limited subset of RDF/XML.  Please get in touch if want this support extended.

The following example gives a concise summary of the supported RDF/XML.  Green text is constants that phdccRDF looks for.  Blue text is page or field information picked up by phdccRDF.

<rdf:rdf>
    <rdf:description rdf:about=URI>
        <ANY_TAG>CHARS</ANY_TAG>
        <ANY_TAG>
            <rdf:bag> or <rdf:seq> or <rdf:alt>
                <rdf:li>CHARS</rdf:li>
                <rdf:li>CHARS</rdf:li>
            </rdf:bag> or </rdf:seq> or </rdf:alt>
        </ANY_TAG>
        <ANY_TAG>
            <rdf:value>CHARS</rdf:value>
        </ANY_TAG>
    </rdf:description>
</rdf:rdf>

  • Multiple rdf:description and ANY_TAG tags are supported (but not nested within themselves).
  • No other tags or attributes are recognised.
  • Not supported:  rdf:ID, abbreviated types, collections, schemas, attributes setting value

phdccRDF stores field information for file URI and sets fields ANY_TAG to CHARS.  If the URI starts with your indexing Base URL, then this is removed from the start of the URI.  Other absolute values for URI are not accepted.

Example RDF using Dublin Core

Dublin Core is a minimal set of descriptive elements that facilitate the description and the automated indexing of document-like networked objects, in a manner similar to a library card catalog.

Dublin Core is ideally suited for describing web resources in RDF/XML format.  (Dublin Core fields can also be defined in HTML META fields - see here.)  Here is an example Dublin Core RDF/XML file:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">

<rdf:Description rdf:about="http://www.example.org/crocs.html">
    <dc:title>Research into Crocodile eating habits</dc:title>
    <dc:creator>Chris Cross</dc:creator>
    <dc:description>A study of how and when crocodiles eat</dc:description>
    <dc:language>en</dc:language>
    <dc:format>text/html</dc:format>
    <dc:subject>
        <rdf:Bag>
            <rdf:li>Crocodiles</rdf:li>
            <rdf:li>carnivore habits</rdf:li>
            <rdf:li>Research summary</rdf:li>
        </rdf:Bag>
    </dc:subject>
</rdf:Description>
</rdf:RDF>
Here is the information found by phdccRDF:
File http://www.example.org/crocs.html
Field nameField value
dc:title Research into Crocodile eating habits
dc:creator Chris Cross
dc:description A study of how and when crocodiles eat
dc:language en
dc:format text/html
dc:subject Crocodiles
dc:subject carnivore habits
dc:subject Research summary

How to use phdccRDF

phdccRDF uses the standard Java SAX API to inspect the RDF/XML files.  The standard Sun Java VM includes a SAX implementation (eg Crimson).

Run Findex and phdccRDF together as follows:

java -cp Findex.jar;phdccRDF.jar com.phdcc.findex.Findex @index.properties
The index instructions file index.properties should contain the usual Findex properties, eg SaveAsPathname, ScanType, ScanDirectory and ScanDirLevels.  Add in a ParserN property to add the phdccRDF plug-in into Findex.  For example, use the following:
parser1=RDF;*.rdf;com.phdcc.findex.rdf.ParseRDF;false;false
ParseRDF=yes
RDF_Files=*.rdf
  All site Copyright © 1996-2011 PHD Computer Consultants Ltd, PHDCC   Privacy  

Last modified: 8 February 2006.

Valid HTML 4.01 Transitional Valid CSS!