Meta field searching in findinsite-cd
Information about a file (or parts of a file) is called meta-data.
For example, the HTML TITLE tag defines meta-data for a web page -
a "title" field. Similarly, this HTML can be used to define
an "author" field for a page:
FindinSite and Findex add this meta-data field information to the search database,
so a search for "Chris Cant" will find this page.
<META NAME=author CONTENT="Chris Cant">
As well as ordinary searches, FindinSite-CD can also do field searches where you search specific field(s).
For example, searching the "author" field for "Chris Cant" might give better results than an
- Field searches are only possible if you build the search database
using the Findex tool; they are NOT available using FindinSite-CD-Wizard.
- Field searches do not work in all browsers -
check compatibility here.
For the example above, a FindinSite-CD "author" field search for
"Chris Cant" would find the web page. An ordinary search for
"Chris Cant" would also find the page because words in field are also stored in the main word list.
Field searches only work on search databases generated by Findex (or FindinSite-JS or FindinSite-MS).
FindinSite-CD-Wizard cannot generate field search databases.
In Windows, you (and your users) need Internet Explorer 4+, Navigator 4+, Opera 7+ or similar to
do field searches. Currently only Mac OS 9 Netscape 7 supports field searches.
Latest compatibility information.
See the field search example.
Field searches are carried out using a standard HTML form in conjunction with
See below for full instructions on how to set up
a field search web page.
FindinSite-CD searches fields in exactly the same way as normal searches,
so parentheses, boolean operators, wild cards etc are available.
For example, to implement a multiple-selection list box,
like those on the field search example,
surrounded by parentheses, eg if "English" and "Chinese" languages are
As stated above, note that the words in each field are not automatically included in the main
word list. In the field search example,
the word "summary" is in the field "DC:Subject". However the word "summary" is not
used on the page itself, so an ordinary search for "summary" will find no results.
Searching field values
FindinSite-CD puts no interpretation on the field values;
they are just treated as a series of characters.
For example, a search for "1996" would not match a "date" field value of "10/10/96".
However various bodies have gone to great lengths to define standard field names and values.
- For example,
RFC 1766 recommends that the various language "lang" fields start with a two letter lower-case
language code (from ISO-639),
and then optionally followed by a hyphen and a two letter upper-case country code
- As another example, the Dublin Core initiative
primarily specifies standard field names for describing documents. Associated
initiatives go further by defining industry-specific schemas, ie a standard set of possible values for
certain fields. See below for more details.
FindinSite and Findex index files to build a search database. They all extract meta-data
information and store it in the search database.
Findex, FindinSite-JS and FindinSite-MS also store the meta-data information in an extra part of the search database which is used
to support field searches.
Indexing HTML meta-data
For HTML files, the following meta-data information is found:
|<META name=nnn content=xxx>
|<META http-equiv=content-language content=xxx>
Note that some field names are hard-coded.
Fields can be repeated; all information is stored.
The letter case of field names is ignored, so field "Author" is the same as "AUTHOR".
For the IMG tag, the field is associated with the page not the image,
so a hit will display the page, not the image.
Some field names contain a period (.). It is modern practice to use colons (:) instead.
Therefore Findex converts periods to colons, eg field name "DC.Title" is stored as "dc:title".
Indexing meta-data from other file types
Meta-data is found in some other file types. Typically it finds the file
title, description and keywords. For example, the PDF file "subject" document property
is stored as a "description" field.
See the file types page for details
of the meta-data information stored in fields for each file type.
The RDF file type is special because it exists solely to define meta-data
for other files -
see here for details.
Standard HTML meta-data
As described above, the indexers find any META/name/content meta-data.
The following is a list of standard or commonly-used META field names:
Dublin Core meta-data
Dublin Core is a minimal set of
descriptive elements that facilitate the description and the automated indexing of
document-like networked objects, in a manner similar to a library card catalog.
Dublin Core fields can be defined in HTML META/name/content tags.
(They can also be put in RDF/XML format,
also indexable by the Findex phdccRDF parser.)
RFC 2731 says that Dublin Core elements should be given a DC. prefix when put in
HTML META fields, so the "Title" element should be given META name "DC.TITLE", eg:
<META NAME="DC.Title" CONTENT="Research into Crocodile eating habits">
The following is a list of standard Dublin Core HTML meta-data field names.
Related initiatives define industry-specific schemas,
ie a standard set of possible values for certain fields.
Findex finds Dublin Core meta-data from META/name/content tags in the normal way.
Note that, as described above, all periods (.) in field names are converted to
colon (:) to use the standard RDF terminology, eg field
DC.Title is renamed DC:Title.
Making field search pages
Field searches are carried out using a standard HTML form in conjunction with
the usual FindinSite-CD window (or an invisible FindinSite-CD with HTML results).
There is no way to specify field search text within the FindinSite-CD window.
There are two aspects to setting up a field search:
- The HTML form and applet
HTML Form and Applet
Use an HTML form to specify the field options that you want the user to see.
The following example uses a TABLE to line up the field prompts and options.
The form is called
fields. If the user presses
Enter in the form,
- we will show this function later.
Each row of the TABLE defines form field values. In this case there is one free text
INPUT field called
creator, and one multiple selection list-box called
language. You could also have any other form field type, such as
radio buttons or check boxes.
The final section of the example defines the FindinSite-CD window in the standard way,
with the APPLET named
fisCD as usual.
The only new code defines the setFieldsFn
setFields() when it needs to find what field search values
the user has specified.
<FORM NAME="fields" onSubmit="return submitFieldsForm()">
<INPUT TYPE=text NAME=creator MAXLENGTH=40>
<SELECT MULTIPLE NAME="language" SIZE="3">
<OPTION VALUE="any" SELECTED>Any
<OPTION VALUE="zh TW">Chinese Traditional
<APPLET CODE=fisCD NAME=fisCD WIDTH=350 HEIGHT=200 ARCHIVE="fiscd.jar" MAYSCRIPT>
<PARAM NAME=index1 VALUE="fields,en">
<PARAM NAME=rules VALUE="rulesen.txt">
<PARAM NAME=setFieldsFn VALUE="setFields">
<PARAM NAME=target VALUE="_blank">
Sorry, your browser is not set up to run Java applets.<BR>
<A TARGET=_blank HREF="http://www.phdcc.com/getjvm.htm">How to get a Java VM</A>.
In the example below,
submitFieldsForm() is called when the user
Enter in the form.
submitFieldsForm() calls the FindinSite-CD Search function;
null parameter means that the search text in the FindinSite-CD window
is not changed.
false to ensure that the default form action
does not occur.
setFields() function is called by FindinSite-CD whenever it needs to find out the
field search values. (Remember that we used the "setFieldsFn" parameter to tell
FindinSite-CD the name of this function.)
setFields() delegates its field setting jobs to two functions:
true to tell FindinSite-CD that it has set the fields.
creatorChange() function gets the "creator" form field value.
If it is empty, then it sets it to
*. If not empty,
it is wrapped in protective parentheses. The creator is then passed to FindinSite-CD
using the FindinSite-CD SetField function.
langChange() function does a similar job for the more complicated case.
If there is no selection or "Any" language is set then the FindinSite-CD "lang" field is set to
null so that the "lang" field is not searched.
langChange() goes through all possible
language options, and - if selected - adds the language value to the string to be passed
to FindinSite-CD. Note how each language value is wrapped in parentheses before being ORed together.
As before, the FindinSite-CD SetField function
is used to set the field value.
This code is complicated by the fact that not all browsers allow the
"lang" field settings to be accessed in the same way.
var creator = document.fields.creator.value;
creator = "*";
creator = "( "+creator+" )";
var lang = document.fields.language.value;
if( lang=="" || lang=="any")
var langcount = document.fields.language.options.length;
var langs = "";
var options = document.fields.language.options;
var optionsitem = options.item;
for( var langno=1; langno<langcount; langno++)
selected = options.item(langno).selected;
selected = options(langno).selected;
langs += " OR ";
langs += "( "+options.item(langno).value+" )";
langs += "( "+options(langno).value+" )";
langs = "( "+langs+" )";