FindinSite-CD: Search engine for CD/DVD   .
  search
Powered by FindinSite-MS
. Home | Examples | Starting | Set up | Advanced | Languages | Purchasing | Email .
. .
  Getting started | FAQ | FindinSite-CD-Wizard | Findex | File Types | PDF | RDF | Parser API | HTML CDs/DVDs

 

Using RDF/XML to set meta fields descriptions


Introduction

The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web.

The FindinSite-CD Findex indexer can index RDF/XML files to find meta field information.  Words defined in RDF/XML are only stored in the search database fields list; they are not stored in the main word list.  The FindinSite-CD runtime can search the field information in the normal way - as described here.

RDF support is provided as a parser Java plug-in for Findex.  The parser is called phdccRDF and is supplied in file phdccRDF.jar.  There is NO support for RDF in the FindinSite-CD-Wizard Windows indexer and set up tool.  phdccRDF will not run under the Microsoft VM.

RDF Support

phdccRDF only supports a limited subset of RDF/XML.  Please get in touch if want this support extended.

The following example gives a concise summary of the supported RDF/XML.  Green text is constants that phdccRDF looks for.  Blue text is page or field information picked up by phdccRDF.

<rdf:rdf>
    <rdf:description rdf:about=URI>
        <ANY_TAG>CHARS</ANY_TAG>
        <ANY_TAG>
            <rdf:bag> or <rdf:seq> or <rdf:alt>
                <rdf:li>CHARS</rdf:li>
                <rdf:li>CHARS</rdf:li>
            </rdf:bag> or </rdf:seq> or </rdf:alt>
        </ANY_TAG>
        <ANY_TAG>
            <rdf:value>CHARS</rdf:value>
        </ANY_TAG>
    </rdf:description>
</rdf:rdf>

  • Multiple rdf:description and ANY_TAG tags are supported (but not nested within themselves).
  • No other tags or attributes are recognised.
  • Not supported:  rdf:ID, abbreviated types, collections, schemas, attributes setting value

    phdccRDF stores field information for file URI and sets fields ANY_TAG to CHARS.  If the URI starts with your indexing Base URL, then this is removed from the start of the URI.  Other absolute values for URI are not accepted.

  • Example RDF using Dublin Core

    Dublin Core is a minimal set of descriptive elements that facilitate the description and the automated indexing of document-like networked objects, in a manner similar to a library card catalog.

    Dublin Core is ideally suited for describing web resources in RDF/XML format.  (Dublin Core fields can also be defined in HTML META fields - see here.)  Here is an example Dublin Core RDF/XML file:

    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns:dc="http://purl.org/dc/elements/1.1/">
    
    <rdf:Description rdf:about="http://www.example.org/crocs.html">
        <dc:title>Research into Crocodile eating habits</dc:title>
        <dc:creator>Chris Cross</dc:creator>
        <dc:description>A study of how and when crocodiles eat</dc:description>
        <dc:language>en</dc:language>
        <dc:format>text/html</dc:format>
        <dc:subject>
            <rdf:Bag>
                <rdf:li>Crocodiles</rdf:li>
                <rdf:li>carnivore habits</rdf:li>
                <rdf:li>Research summary</rdf:li>
            </rdf:Bag>
        </dc:subject>
    </rdf:Description>
    </rdf:RDF>
    Here is the information found by phdccRDF:
    File http://www.example.org/crocs.html
    Field nameField value
    dc:title Research into Crocodile eating habits
    dc:creator Chris Cross
    dc:description A study of how and when crocodiles eat
    dc:language en
    dc:format text/html
    dc:subject Crocodiles
    dc:subject carnivore habits
    dc:subject Research summary

    How to use phdccRDF

    phdccRDF uses the standard Java SAX API to inspect the RDF/XML files.  The standard Sun Java VM includes a SAX implementation (eg Crimson).

    Run Findex and phdccRDF together as follows:

    java -cp Findex.jar;phdccRDF.jar com.phdcc.findex.Findex @index.properties
    The index instructions file index.properties should contain the usual Findex properties, eg SaveAsPathname, ScanType, ScanDirectory and ScanDirLevels.  Add in a ParserN property to add the phdccRDF plug-in into Findex.  For example, use the following:
    parser1=RDF;*.rdf;com.phdcc.findex.rdf.ParseRDF;false;false
    ParseRDF=yes
    RDF_Files=*.rdf
      All site Copyright © 1996-2008 PHD Computer Consultants Ltd, PHDCC   Privacy  

    Last modified: 8 February 2006.