|
findinsite-cd Language support
Overview
FindinSite-CD is supplied fully internationalised for many languages:
- FindinSite-CD detects the user's preferred language
and uses the most appropriate language for its display prompts, etc.
See screen shots of FindinSite-CD running in different languages.
- If you specify an index for the user's preferred language, then
this will be chosen by FindinSite-CD at startup.
- By default, FindinSite-CD will show a "Languages" button that lets you switch
between the available languages.
You can configure the FindinSite-CD language handling in many ways. You can
support whole new languages or
alter existing languages.
In addition, you can configure what languages are made available,
and whether to show the Languages button.
FindinSite-CD-Wizard and Findex
The indexing tools, FindinSite-CD-Wizard and Findex, read files in many languages.
For most non-English languages, you need to write your web pages in the appropriate
character set. FindinSite-CD-Wizard and Findex support most common character sets used in the world.
Make sure that you specify the correct META Content-Type tag.
See the HTML Character sets,
PDF Support and File types
pages for more details.
Characters with accents, etc.
If your files have characters with
accents, umlauts, etc., then FindinSite-CD-Wizard and Findex will find all these characters correctly.
In addition the FindinSite-CD search page will let you search for words with these characters correctly.
Note carefully that searching for donnee, for example, will find words with accents, such as
donnée and données, and vice versa. While this is useful most of the time,
a search for thé will also find the which may be slightly confusing.
In general we believe it is better to find more words than less words.
You can restrict the search by putting in single quotes, eg searching 'thé'
will not find the.
Word highlighting
Word highlighting in Navigator does not usually work for characters with accents, because
the word highlighter does not recognise characters such as é.
Locales
|
Example locale strings
|
| en |
English |
| enGB |
English, United Kingdom |
| fr |
French |
| frFR |
French, France |
| frCA |
French, Canada |
| de |
German |
| it |
Italian |
| ja |
Japanese |
| zh |
Chinese |
| zhTW |
Chinese (Taiwan) |
A language is defined by its locale string. In fact, a locale specifies
a language and optionally a country. For example, the locale "en" refers to English
while "enGB" refers to English as used in the United Kingdom.
The locale string is used in various places:
When specifying index parameters
When writing a language file
When writing a rules files
The first two characters of the locale string
give the ISO Language Code.
These codes are the lower-case two-letter codes as defined by ISO-639.
Here is a full list of language codes taken from
http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt
If supplied, the next two characters of the locale string
give the ISO Country Code.
These codes are the upper-case two-letter codes as defined by ISO-3166.
Here is a full list of country codes taken from
http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html
DefaultLocale
When run in most recent browsers, FindinSite-CD can determine the user's preferred
locale. However older browsers such as Navigator 3 or Internet Explorer 3
assume that the user's preferred language is English.
You can change the default locale for these older browsers by setting the
defaultLocale applet parameter.
For example, to set the default locale to French, add the following
parameter to your search page:
<PARAM NAME=defaultLocale VALUE="fr">
|
Startup and Usage
At startup, FindinSite-CD gets the user's preferred locale (from the operating system,
not the browser) and sets the language and index:
- The best matching available language is chosen.
- If more than one index parameter is specified with locales, then the
most appropriate index is chosen -
see the Indexes page for full details.
After startup:
- the user can switch language using the "Languages" button (if it is available)
- If present, the user can change index using the "Indexes" button.
Adding languages
|
Arabic (العربية) translation provided by Lubna Sorour.
|
|
Chinese (简体中文) translations provided by Nan Chem and Mary Rack.
|
|
Croatian (Hrvatski) translation provided by Zvonimir Bulaja
at www.bulaja.com.
|
|
Czech (Česky) translation provided by Milan Hampl.
|
|
Dutch (Nederlands) translation provided by Hans Schipper
|
|
French (Français) translation done in-house.
|
|
German (Deutsch) translation provided by Renate Heath,
and Julian Calvert of Software AG.
|
|
Italian (Italiano) translation provided by Carmelo Cutuli of
Global Communication, and
Dr Stefania Goffredo of Reggiani S.p.A.
|
|
Japanese (日本語) translation provided by Yuichi Tokunaga of
Cybernet System Co Ltd.
|
|
Norwegian (Norsk) translation provided by Anderson F. R. dos Santos, Norway.
|
|
Portuguese (Português) translation provided by Fernando Nunes, Macau.
|
|
Slovenian (Slovenščina) translation provided by Luka Malenšek, Slovenia.
|
|
Spanish (Español) translation mainly provided by Eduardo Zamora of the
Instituto Latinoamericano de la Comunicación Educativa (ILCE), Mexico.
|
To add support for a new locale, you must do two steps:
- Write a language file
- Tell FindinSite-CD to use it using the Languages parameter.
Once your language is recognised by FindinSite-CD,
double-check that all the strings display correctly.
Note that not all browsers can display all strings at the same time,
eg Asian text may not display correctly on a Western system.
Writing a language file
The display button labels, text and messages for a language
are defined in a language file. FindinSite-CD has built-in support for English and
many other languages. The language files are in the COM/phdcc/lang
subdirectory of the FindinSite-CD runtime. By convention, a language file usually has a
file extension of .hil.
There are three different language file formats. The recommended format is a plain
text file, starting with a line containing the number 3. Then specify more
lines with name=value string definitions. See the string
name definitions section below for details all strings.
A language file must include Language, Country, Localname
and Englishname definitions, as can been seen in this excerpt from
the German language file:
3
Language=de
Country=
Localname=Deutsch
Englishname=German
L_AND=UND
L_OR=ODER
L_NOT=NICHT
L_PAGES_AND=\ Seiten und
L_WORDS=\ W\u00F6rter.
|
The last two definitions illustrate these points:
- If you want a space
at the beginning or end of a string value then you must write it as
\ .
- Any non-USASCII characters should be expressed in Unicode format, ie
\uHHHH where HHHH is the hexadecimal for the Unicode character.
For example:
- \u00F6 for the small letter o with a diaeresis: ö, ie U+00F6
- \u20AC for the Euro symbol: €, ie U+20AC.
- The backslash character must be represented as \\.
- Other characters may also be preceded by \,
as in the space example above
\ .
Alternatively, you can store the language file in UTF-8 format, with UTF-8 prefix bytes 0xEF 0xBB 0xBF.
Windows Notepad normally stores these prefix bytes automatically if you store in the UTF-8 Encoding.
Note that you must use \ for a space at the end of a string.
If you do not specify definitions for any strings, then the English version will be used.
Languages parameter
Once you have written a language file, you must tell FindinSite-CD to use it.
Put the language file in your FindinSite-CD directory in your CD image and add
a Languages parameter.
The optional Languages parameter should contain a
comma-separated list of language file URLs. If you supply a language file
for a language that is built-in to FindinSite-CD, then your language definition will
take precedence.
For example, to add Greek
language support and replace the English language file, you could specify
the following, where FindEl3.hil has the Greek locale language code of el,
and FindMyEn3.hil has the English locale language code of en.
<PARAM NAME=Languages VALUE="FindEl3.hil,FindMyEn3.hil">
|
For older language file formats, you need to specify language and country codes in addition
to the language file URL. Put these after semi-colons after the URL, eg:
<PARAM NAME=Languages VALUE="FindEl3.hil,FindEnUK.hil;en;gb">
|
Changing languages
You can change individual language strings using applet parameters.
This may be easier to use than providing a whole new language file.
It also lets you change the default English strings without having to provide a whole English
language file.
This method will change any language, including the ones you supply using
the Languages parameter.
You can provide one or more parameters for each built-in language.
The parameter name must be in this format:
Lang_<language code><country code>_<index>
where
<language code> is the language code
<country code> is the optional language country code
<index> is a number incrementing from 1 for each language
The parameter value is in this format:
<string name>,<new string>
where
<string name> is the name of the string that you want to replace in upper case
<new string> is the new string
The parameter name is case insensitive, but the <string name>
in the parameter value must be in upper case. The <new string>
must use the backslash escape sequences described above if necessary.
The string name definitions are given below. Note that you cannot
change the Language, Country, Localname and Englishname definitions.
This example replaces two English strings and one string each for French and Traditional Chinese:
<PARAM NAME="Lang_en_1" VALUE="L_SUBSETS,Sets">
<PARAM NAME="Lang_en_2" VALUE="L_SELECT_SUBSETS, Choose which data set to use\:">
<PARAM NAME="Lang_fr_1" VALUE="L_DATABASES,bases de donn\u00E9es">
<PARAM NAME="Lang_zhTW_1" VALUE="L_SUBSETS,\u7528\u6237\u7535\u8BDD\u673A">
|
Languages to use
If - for some reason - you do not want FindinSite-CD to make all its languages
available, then you can specify the list of available languages
in the UseLanguages parameter.
Specify a comma-separated list of locale strings, ie language and optional country codes.
For example, to only make English, French and Traditional Chinese available:
<PARAM NAME=UseLanguages VALUE="en,fr,zhTW">
|
If you only ever want to use English, then use the following:
<PARAM NAME=UseLanguages VALUE="en">
|
If only one language is available, then the "Languages" button will not be shown.
Do not specify a UseLanguages parameter at all if you want
all the supplied FindinSite-CD languages available.
Languages button
If there is more than one available language, then FindinSite-CD shows the "Languages" button by default.
If you do not ever want to show the "Languages" button then you must set parameter
ShowLanguages to no, eg:
<PARAM NAME=ShowLanguages VALUE="no">
|
|
Language file string name definitions
|
The following table listed all the localisable strings supported by FindinSite-CD and FindinSite-JS,
together with the English default value.
The Language, Country, Localname and Englishname strings
must be provided in a language file, but cannot be changed using
Lang_XXX parameters.
In most cases the string name is self explanatory.
However, a special description is added where necessary.
Note carefully that some strings require spaces at the beginning and end.
As can be seen, you can use basic HTML in some strings, as described in the
Screen layout - Results layout section.
| String Name |
Special Description |
English |
| Header |
| Language | The language code: lower-case two-letter | en |
| Country | The country code: upper-case two-letter | |
| Localname | Language name in the language | English |
| Englishname | Language name in English | English |
| Logical operators |
| L_AND | Must be a single word | AND |
| L_OR | Must be a single word | OR |
| L_NOT | Must be a single word | NOT |
| L_NEAR | Must be a single word (Not used yet.) | NEAR |
| Button labels |
| L_SEARCH | Pad with spaces at either end if L_STOP is longer | Search |
| L_STOP | | Stop |
| L_HELP | | Help |
| L_SUBSETS | | Subsets |
| L_INDEXES | | Indexes |
| L_LANGUAGES | | Languages |
| Status messages |
| L_READING | Followed by search database URL | Reading |
| L_HELP_UNREADABLE | | The search database files could not be read. |
| L_SHOWING | Followed by page filename; shown in browser status bar | Showing |
| L_POST_SHOWING | Preceded by page filename; shown in browser status bar | |
| Main help text |
| L_ENTER_TEXT | | Enter your search text in the box above and click on Search. |
| L_HELP_SEL_PAGE | | To view a page in the results list, click on its title. |
| L_HELP_TOP | | <B>FindinSite-CD</B> finds pages that contain all the words in your search text anywhere on the page. |
| L_HELP_MATCH | | Use single quotes <B>' '</B> to find matching capital letters. |
| L_HELP_ADJACENT | | Use double quotes <B>" "</B> to find adjacent words. |
| L_HELP_WILD | | Use <B>?</B> to match exactly one character and <B>*</B> to match any number of characters. |
| L_HELP_LOGICAL_OPERATORS1 | "AND, OR, NOT" put after this phrase and before L_HELP_LOGICAL_OPERATORS2 | Use <B> |
| L_HELP_LOGICAL_OPERATORS2 | | </B> and parentheses <B>(</B> <B>)</B> to do logical searches. |
| Help text: information display |
| L_INDEX | | Index: |
| L_DESCRIPTION | | Description: |
| L_CONTAINS | | Contains: |
| L_PAGES_AND | | pages and |
| L_WORDS | | words. |
| L_CREATED | | Created: |
| L_FILE | | File: |
| L_SITE | | Site: |
| L_LANGUAGE | | Language: |
| L_USER_LOCALE | | User locale: |
| L_RULESET | | Rules: |
| Search text error reporting |
| L_NO_SEARCH | | Nothing to search for |
| L_FOUND_UNORDERED | | These words were found, but not in this order: |
| L_WORDS_NOT_FOUND | | These words were not found in any pages: |
| L_ABORTED | | Search aborted |
| L_NO_CONTIG | | Sorry, adjacent word searches are not supported by this search database |
| L_2DQ_NEEDED | | Sorry, your search text has mismatched double-quotes |
| L_NO_EXACT | | Sorry, this search database does not store different letter cases |
| L_MISMATCHED_PARENTHESES | | Sorry, your search text has mismatched parentheses ( and ) |
| L_BAD_BRACKETS | | Parentheses not allowed within "double quotes" |
| L_INCORRECT_PLACE | Preceded by AND, OR or NOT | in incorrect place |
| L_BAD_WILD | | * and ? not allowed within 'single quotes' |
| L_BAD_ASTERISK_DQ | | * not allowed within "double quotes" |
| Results reporting |
| L_PAGE | Used when reporting "1 page found" | page |
| L_FOUND | Used when reporting number of pages found, except if overridden by L_FOUND_ZERO or L_FOUND_PLURAL. | found |
| L_PAGES | | pages |
| L_PRE_FOUND | Used in languages where found appears before number when reporting "10 pages found" | |
| L_FOUND_PLURAL | If specified, then plural of "found" | |
| L_FOUND_ZERO | If specified, then use to report "0 pages found" | |
| L_PAGES_ZERO | If specified, then use to report "0 pages found" | |
| L_STOP_WORDS | | Ignored common words: |
| Subsets messages |
| L_SELECT_SUBSETS | | Choose subsets to include in search: |
| L_DATABASES | | databases |
| L_NO_SUBSETS_SELECTED | | No subsets are selected. |
| Indexes messages |
| L_SELECT_INDEX | | Select index: |
| Languages messages |
| L_SELECT_LANG | | Select language: |
| SearchAndGetResults() messages |
| L_SEARCH_WAIT | The message returned if the database is still being loaded | Search database loading... please try again soon. |
| L_SEARCH_TIMEDOUT | The message returned if a search has timed out | Search timed out. |
| FindinSite-JS and FindinSite-MS |
| L_APPNAME | | FindinSite-JS or FindinSite-MS |
| L_APPNAME_HTML | | findinsite-js or findinsite-ms |
| L_SEARCH_ENGINE | | search engine |
| L_HELP_TOP_SERVER | | <B>FindinSite-JS</B> finds pages that contain all the words in your search text anywhere on the page. |
| L_SEARCH_FOR | | Search for: |
| L_SEARCH_RESULTS_FOR | | Search results for: |
| L_PREVIOUS | | Previous |
| L_NEXT | | Next |
| L_FIRST | | First |
| L_LAST | | Last |
| L_OF | | of |
| L_RESULTS | | Results |
| L_SECONDS | | seconds |
| L_HTML_TAG | Set to DIR=rtl for languages that read from Right To Left | |
| L_BODY_TAG | Set to DIR=rtl for languages that read from Right To Left | |
| L_ALIGN_TAG | Set to ALIGN=right for languages that read from Right To Left | |
|