|
findinsite-ms version details
findinsite-ms ASP.NET application
| 1.61 |
April 17, 2008 |
Indexing: Credentials now supports Integrated Windows Authentication
Indexing: Fixed bug when removing indexing from completed list
|
| 1.60 |
November 22, 2007 |
Compiled to run in ASP.NET 2.0+ web site
Search: dynamic database searching supported
Search: Highlighting of search words in results fixed for multiple subsets
Search: Field searches fixed for multiple subsets
Look and Feel: %DYNAMIC_DB% supported in header and footer
Config: Searching section new option added "Dynamic database searching regular expression"
Indexing: Cope with unusual BASE tag values
Indexing: Cope with Moved Location even better
Indexing: HardExclude advanced option added
Startup: reallySetLanguages exception handled
Indexing: PDF: Cope with format variant
Emails: sent using ASP.NET 2.0+ method
Config: Indexing From Email Password is a 'password' type input field
Search: cope with bad URL parameters better
|
| 1.51 |
May 8, 2007 |
Indexing: XLS and PPT: TextExtractor call bug fixed
Indexing: PDF and XLS: Floating point numbers identified correctly on non-English computers
|
| 1.50 |
December 18, 2006 |
Indexing: Algorithm changed to reduce memory requirement
Indexing: HTML: Cope with just 'text/html' and 'text-html' charsets
Indexing: PDF: indexing speed-ups
Indexing: PDF: Only report unrecognised encoding /Identity-H if PDF_ReportCharacterDecodeProblems set
Indexing: PDF: UnicodeEncoding bug fix
Indexing: Image: Find XMP (Extensible Metadata Platform) meta-data, eg Vista Tags
Indexing: Cope with (ie ignore) read errors
Indexing: Cope with include/exclude/robots after HTTP redirect
Indexing: Robots not case significant
Indexing: Pause every 100 files for 0.1 second
Indexing: Don't write fields or anchors if file not being indexed
Control Panel: Memory, searches and indexings counts since restart listed on About page
|
| 1.21 |
November 22, 2006 |
Indexing: Word 2007 DOCX/DOCM files supported
Indexing: Excel 2007 XLSX/XLSM files supported
Indexing: Powerpoint 2007 PPTX/PPTM files supported
|
| 1.20 |
September 21, 2006 |
Indexing: Ignore <?xml...> in web pages
Indexing: BASE tag supported
Config: Load template files in UTF-8
Highlight: Find charset more flexibly
Highlight: Fix bug if search word found in header
Language: Thai language supported
Search: Fix bug if space searched for
|
| 1.19 |
July 4, 2006 |
Indexing: Excel XLS file indexing - minor improvements
Indexing: Sections of web pages can be excluded using GoogleOn/Off and FindinSiteOn/Off comments
Indexing: URL recursion stopped using MaxURLLength, with default of 1024.
Look and Feel: displayError template supported in finderror.htt - More...
General: FindinSite image returned accurately
|
| 1.18 |
March 23, 2006 |
Indexing: Excel XLS file indexing and searching supported
|
| 1.17 |
February 13, 2006 |
Language: "Languages to Use" option added to Look and feel Control Panel
Language: Language and text direction forced to English for config page heading
Email: SMTP Mail Host option provided on Indexing config page
Email: SMTP send basic authentication password support (can be stored in Web.Config appSettings)
|
| 1.16 |
October 28, 2005 |
Indexing: Publisher PUB file indexing now supported
Language: Norwegian language file added
General: Logo and web site change and rename to findinsite-ms
General: Bug fix: Disregard include in template variable substitutions
General: Improved results sorting
|
| 1.15 |
July 29, 2005 |
Language: Bug fix: non-Western characters identified correctly
|
| 1.14 |
July 28, 2005 |
Language: Arabic (العربية) user interface added (thanks to Lubna Sorour)
Language: Arabic words now delimited by spaces etc
Language: Arabic character versions handled better (ا ى ه و)
Language: Arabic 'the' (ال) at start of word handled correctly
Language: Arabic search for 'the' by itself ignored
Language: Language files now assumed to be in UTF-8
Language: Right-to-left (RTL) languages supported using %L_HTML_TAG%, %L_BODY_TAG% and %L_ALIGN_TAG% strings in templates
Language: findinsite-ms version date localised
Language: Slovenian (Slovenščina) user interface added (thanks to Luka Malenšek)
Indexing: If Content-Type HTTP header specifies HTML charset, use this and ignore META charset.
Indexing: Try to determine HTML charset from META charset before main parse.
Highlighting: Bug fix: pages starting with UTF-8 marker bytes incorrectly recognised
|
| 1.13 |
June 29, 2005 |
Output: Extra linefeeds removed from around Included file content
Output: Included files only sent form data if included file is an .aspx
Highlight: "highlighted by" footer removed because it was not shown in the correct position by FireFox on some sites
Installation: bin dll library files renamed with phdcc.fis. prefix - be careful to delete old DLLs before installing new ones
Search API: Remaining result line variables made available
|
| 1.12 |
May 27, 2005 |
Search API: Highlight URL returned now works with FireFox
Indexing: First suggested filename doesn't have 1 appended
Indexing: Results email includes URL, File or Directory
Indexing: Search db description not saved if indexing run edited
Indexing: Report better error if image file has zero length
Search: Bug fix: crash if search db not loaded successfully
Search: Remove ? from end of search if question asked, ie if more than 1 word
Config: cope better if existing search db corrupted
Config: better on-page JavaScript handling for create new indexing
Output: Site(s) being searched added to default template using %L_SITE% and %SITES%
|
| 1.11 |
May 19, 2005 |
Config: Very first control panel has easy option to make index and search
Indexing: For charset "text/html;" assume ISO 8859-1
Indexing: Unrecognised robots tags ignored
Indexing: redirect out of directory handled better
Indexing: .php added to default HTML file types
Highlight: content-type checked better, so aspx pages work
Highlight: works for sites that use Transfer-Encoding in response header
Search: cope with apostrophes better
|
| 1.10 |
April 15, 2005 |
Indexing: Username/password supported using new
Credentials
advanced option (basic/digest credentials supported)
Output: Various speed ups
Output: %L_APPNAME% not made HTML-safe
|
| 1.9 |
April 14, 2005 |
Indexing: PDF and TXT indexing speed increased
Indexing: Abort mid-file implemented
Indexing: Bug fixed: slowness if
AbstractWords set to 0
Indexing: Redirections off-site not reported as errors
Indexing: Minor DOC parsing fixes
|
| 1.8 |
April 2, 2005 |
Indexing: Bug fixed: page redirection timeout
|
| 1.7 |
April 1, 2005 |
Indexing: Bug fixed: page redirection
|
| 1.6 |
April 1, 2005 |
Output: Results list has snippet excerpts from each page, with search words highlighted
Output: Default template redesign
Output: Styles used in many generated HTML elements
Output: New results variables supported: file size, date, date-indexed, word-count, etc
Output: More output dates localised
Output: New language file strings supported
Indexing: More information stored for each indexed file
Indexing: If file fails
Include or
Exclude
then it is still spidered and links followed
Indexing:
UserAgent and
ObeyRobots
advanced options added
Indexing: <br> not added to abstract at line breaks
Indexing: web errors made more concise: no stack trace
Indexing: AbstractWords
now defaults to 0, ie abstract not obtained from first words of file
Highlight: Bug fixed: highlight fails for search of *
Highlight: Copes with bad HTML better
Config: Bug fixed: Pages now counted correctly when db removed
|
| 1.5 (5.4) |
March 3, 2005 |
Indexing: Crawl-Delay throttle implemented for robots.txt
|
| 1.4 (5.4) |
February 22, 2005 |
Indexing: Page redirect works better
Highlight: Bug fixed: does not pass on "accept-encoding" header
Output: Last run output for indexing in progress has better message
Output: Default result logo updated
Licensing: All starts logged at phdcc.com
|
| 1.3 (5.4) |
February 10, 2005 |
Indexing: robots.txt supported
Indexing: Cookies maintained throughout each indexing run, saving session state
System: Fix initialise security exception on some shared hosts
|
| 1.2 (5.4) |
February 2, 2005 |
Highlight: Highlight of hits in HTML pages; highlight configuration options added
API: Search API updated to add HighlightURL to each returned result
API: Search API bug fixed: GetFieldNames() causes exception if no fields available
Indexing: FindInSiteBot user-agent HTTP header added to indexer, referring to robots bot page,
Indexing: Various PDF indexing fixes
Indexing: REL="nofollow" supported in A tags
|
| 1.1 (5.4) |
January 4, 2005 |
Release
|
Possible problems
- Email: Note that not all hosts support email from ASP.NET programs.
- Highlight: In a very small number of cases, pages do not show correctly when
findinsite-ms highlights across domains -
more details.
- Indexing: Running Visual Studio.NET may cause findinsite-ms
to hang while indexing. Technical: when a page has been redirected, a hang sometimes occurs
when the HttpWebResponse Close() method is called.
Known bugs
- Default target not used
- After log in you may see error message "makeschange=login"
- If you edit an indexing run that has not been run, then indexing information lost
- https: access may go wrong
Possible improvements
- Search: for specific search display specific pages
- Install: .msi installer
- Indexing: Handle larger sites
- Indexing: Indexing depth option
- Indexing: Split include into include_filter and include_index. The same for exclude.
- Indexing: Configurable start and end abstract indicators
- Indexing: Provide site report, eg add META DESCRIPTION, META DESCRIPTION all the same, etc
- Indexing: Show indexing problem count as indexing is running
- Indexing: support robots noarchive
- Indexing: Last run output in more helpfule format, eg CSV or XML
- Log: Option to log hits shown using highlighter
- Output: Provide better information when cross-site scripting parameter attacks stopped,
ie provide better response for < > etc
- Output: Indicate file type for each result eg [PDF]
- Output: Provide user with option to show more details for each link, eg %ALLTEXT%
- Output: Somehow make HTML optional, eg do not include following <br> if abstract empty
- Output: Provide parameters for %SNIPPET%
- Output: Make complete set of template variables available as Include file variables
- Highlight: header of highlighting page: could add found X words on page, to refine search, etc
- Highlight: provide separate colours for each word
- User: Results per page option for user
- User: Advanced search: results design option for each user, ie choose which elements to show - stored in cookie
- General: Image db containing thumbnails
- General: Cookies turned off support
- General: Provide a DNN3 module to interface to an external instance of findinsite-ms
|
|