findinsite-cd 日本語 Japanese support
Screenshot of FindinSite-CD running in Japanese
FindinSite and Findex support Simplified and Traditional Chinese characters.
- FindinSite and Findex can scan web pages in the Shift JIS, JIS and EUC character sets (shift_jis, x-sjis, iso-2022-jp, euc-cp)
- FindinSite-CD-Wizard and Findex can scan MS-Word, MS-Excel and MS-PowerPoint files containing Japanese characters.
- However, FindinSite-CD-Wizard may not be able to scan PDF files containing Japanese characters.
- FindinSite has a Japanese user interface, from a Japanese language file.
To see this in action your computer and browser must support Japanese character sets.
FindinSite-CD-Wizard Windows set up tool
FindinSite-CD-Wizard and Findex can scan Japanese character set web pages even if your computer does not have
Japanese character sets installed.
If you are running on a Japanese PC then you will be
able to view and edit in Japanese in FindinSite-CD-Wizard.
If not, then you can still edit the search database - if you
take care. See the Character sets page for full details
of viewing and editing.
Read the Character sets page for details of how to set up
Windows 2000 and XP to view and edit in Japanese.
FindinSite-CD Java applet
FindinSite-CD is the Java applet that you distribute to your customers on CD-ROM.
FindinSite-CD has a Japanese user interface language file and will work with Japanese characters.
Your customers must have a computer with Japanese character set support to see the Japanese characters.
They also must have a browser Java implementation that supports Japanese.
See the Character sets page for details of how to set up
Internet Explorer and Netscape Communicator to display Japanese characters.
See the characters sets page for full details of the supported Japanese character
Japanese characters are translated from the supported Japanese web character sets (eg Shift JIS)
into Unicode. These Unicode characters are stored in the FindinSite search database in UTF-8 format.
Japanese full-width western characters are translated into the base Western character code.
Similarly, all half-width Katakana and Hangul characters are translated into their
standard width character codes.
Other useful character code translations are also done.
characters are treated as single words by FindinSite. For example, the three characters
in the word "Japanese" (日本語) are separate words, 日, 本 and 語.
However, if you search for 日本語
then FindinSite will effectively put double quotes around these characters, so that only instances
of these three characters together will be found. If you want to find all instances of 日, 本 and 語
on a page, then search for 日 本 語, ie with spaces in between.
Note that all HTML tag names and HTML tag attribute names must be in Western characters, ie in the Unicode range \u0000 to
\u00FF inclusive. And all web page names and target frame names must be in English.
For example, the following line is accepted by FindinSite and Findex:
<META NAME="description" CONTENT="日本語">
In this example, META is a tag name, and NAME and CONTENT are tag attribute names.
Currently there is no Japanese stop word file.