Enter any advanced options at stage 4 of the indexing wizard, when prompted to Enter
any advanced options. In the box below, type in any settings, one per line, with
each line having a name=value. For example, to enable indexing of text files with
file extensions .txt and .bat, enter this:
| Name |
Description |
Default |
|
Description |
The search database description |
Taken from the first page title found |
|
ScanType |
Indicates how findinsite-js finds files to index:
|
url |
|
ScanDirectory |
The directory used to find files if ScanType is dir
|
|
|
ScanDirLevels |
The number of directory levels to scan if ScanType is dir.
Use a number in the range 0 to
255, or
all.
|
all |
|
ScanPathname |
The initial file scanned if ScanType is file
|
|
|
ScanURL |
The initial URL scanned if ScanType is url
|
Set in wizard |
|
ParseHTML |
Specify true if you want to scan HTML web pages,
or false if not.
|
true |
|
HTML_Files |
The file specification for HTML files, using * and ? wildcards as needed.
Separate individual specifiers with a comma.
|
*.htm,*.html |
|
ParseTXT |
Specify true if you want to scan TXT text files,
or false if not.
|
false |
|
TXT_Files |
The file specification for TXT files, using * and ? wildcards as needed.
Separate individual specifiers with a comma.
|
*.txt |
|
ParsePDF |
Specify true if you want to scan PDF text files,
or false if not.
|
false |
|
PDF_Files |
The file specification for PDF files, using * and ? wildcards as needed.
Separate individual specifiers with a comma.
|
*.pdf |
|
PDF_Passwords |
Specify a comma-separated list of passwords to open PDF files.
|
|
|
PDF_ReportCharacterDecodeProblems |
Specify true if you want to have any PDF character decode problems listed,
or false if not.
|
false |
|
ParseDOC |
Specify true if you want to scan DOC Word document files,
or false if not.
|
false |
|
DOC_Files |
The file specification for DOC files, using * and ? wildcards as needed.
Separate individual specifiers with a comma.
|
*.doc |
|
ParseXLS |
Specify true if you want to scan XLS Excel spreadsheet files,
or false if not.
|
false |
|
XLS_Files |
The file specification for XLS files, using * and ? wildcards as needed.
Separate individual specifiers with a comma.
|
*.xls |
|
ParsePPT |
Specify true if you want to scan PPT PowerPoint presentation files,
or false if not.
|
false |
|
PPT_Files |
The file specification for PPT files, using * and ? wildcards as needed.
Separate individual specifiers with a comma.
|
*.ppt |
|
ParseImage |
Specify true if you want to scan JPEG images for meta-data,
or false if not.
|
false |
|
Image_Files |
The file specification for JPEG files, using * and ? wildcards as needed.
Separate individual specifiers with a comma.
|
*.jpg,*.jpeg |
|
CaseSignificant |
If finding files by following links, then the case of filenames is ignored
if false.
If true then findinsite-js
views test.htm and Test.htm as separate files.
Windows always seems to ignore filename letter cases.
In Unix, filename case must be correct.
|
Windows: false
non-Windows: true
|
|
StoreStopWords |
If false,
findinsite-js does not include words specified in
StopWordFile.
|
true |
|
StopWordFile |
The pathname of the file containing stop words, with one word per line in UTF-8 format. |
|
|
NoTitleIgnorePageLinks |
If finding files by following links and this property is set to
true,
then links are not followed if a page has no title.
|
false |
|
ParseUpHierarchy |
If finding files by following links and this property is set to
true,
then links are followed to directories above the initial file.
|
false |
|
StorePositions |
If true then findinsite-js stores
word positions so that "adjacent word" searches will work.
|
true |
|
StoreLoneWords |
If true then findinsite-js stores
a word's position even if the two surrounding words are stop words.
|
true |
|
UseNoBaseURLs |
Determines whether not to include a Base URL prefix for each page in the search database
|
false
|
|
UseMetaDescriptionAsAbstract |
If true then the
page abstract will be taken from the page META description tag.
|
true |
|
UseMetaAbstractAsAbstract |
If true then the
page abstract will be taken from the (new) page META abstract tag.
|
true |
|
AbstractWords |
If building the abstract from the words in a file, this property
indicates the number of words to use.
|
30 |
|
Include |
A list of file specifications to include in the search database.
See below
|
All files will be included |
|
Exclude |
A list of file specifications to exclude from the search database.
See below
|
No files will be excluded |
|
Credentials |
A list of username/password credentials.
See below
|
No usernames/passwords |
Note that the Includes are processed first and the Excludes afterwards,
so an Exclude file-spec takes precedence.
An individual file-spec can include zero or more * or ? wildcard characters,
where ? matches exactly one character, and
* matches zero or more characters.
For example file???.ht* would match:
file001.htm,
file101.html and
file111.ht
but not
file1001.htm
A list of file-specs can be given directly in the property, or indirectly in a file.
Direct file-specs
Direct file-specs are semi-colon separated, eg:
Include=iso*;*12*
Exclude=file???.ht*
This specifies two Include file-specs and one Exclude file-spec.
Indirect file-specs in a file
An indirect value consists of @ followed by a file name,
where file-specs are specified one per line in plain text.
The above direct example may be expressed indirectly as follows:
Include=@includes.txt
Exclude=@excludes.txt
where includes.txt contains:
iso*
*12*
and excludes.txt contains:
file???.ht*
If an indirect file cannot be opened, an error message is reported.
Username/password credentials
If the web site being indexed requires one or more usernames/passwords for authentication, then pass this information
in the Credentials property.
The Credentials property must consist of a semi-colon separated list of
credentials. Each credential contains comma-separated fields: a username and a password.
Spaces are trimmed at the ends of all fields.
For example, for a single username (uname) and password (pwd), use this:
Credentials=uname,pwd
Example
Description=My web site
ScanType=url
ScanURL=http://www.mycompany.com/
ParseHTML=true
HTML_Files=*.htm,*.html,*.asp
ParseTXT=false
ParsePDF=true
PDF_Files=*.pdf
CaseSignificant=false
StoreStopWords=true
StopWordFile=
NoTitleIgnorePageLinks=true
ParseUpHierarchy=false
StorePositions=true
StoreLoneWords=true
UseMetaDescriptionAsAbstract=true
UseMetaAbstractAsAbstract=true
AbstractWords=30
Include=
Exclude=