FindinSite-MS: Search engine for an ASP.NET website   .
  search
Powered by FindinSite-MS
. Home | Installation | Indexing | Control Panel | Web services | Advanced | Purchasing .
. .
  Feature list | Awards | Hosted search | Download | Getting started guide | Versions | Readme

 

findinsite-ms version details


Known problems/bugs
Possible improvements

findinsite-ms ASP.NET application

Bugs: Wild card in field search fails

1.74 April 30, 2018 Fix: can now access TLS1.2 secure sites
Rebuilt in .NET4
1.73 July 7, 2015 PDF: read xref correctly
1.72 February 26, 2014 Email: Set send email credentials better
1.71 April 10, 2012 Search API: Fields now getting through again
Indexing: Cope with repeated Location headers
1.70 July 21, 2011 Indexing: PDF: Cope with unexpected name format
1.69 May 15, 2011 Indexing: Fix minor bug indexing PDFs
1.68 March 2, 2010 Indexing: Parse DOCX better so no broken words
Web service: Correctly remove High and Low surrogate characters from returned XML
Search databases: check every hour for updates and reload if necessary - see web farm information.
Templates: nowrap removed from header template
Indexing: Remove port when creating an automatic search database
Indexing: For directory scans, cope if directory inaccessible
1.67 August 20, 2009 Highlighting: Remove various "if-" headers that cause highlighting to fail: "304 not modified" returned
Highlighting: "ShowCredentials" parameter supported in Web.Config
Indexing: Credentials now Negotiate" to support Kerberos
Dynamic database searching: don't show %DB_CREATION_DATE% if no main database loaded
Languages: 13 European languages added to user interface
Highlighting: works for Greek and Bulgarian text
Indexing: now uses latest version of ICSharpCode.SharpZipLib for unzipping new office files
1.66 April 3, 2009 Indexing: Cope with Content-Location HTTP header that refers to the current URL
1.65 March 9, 2009 Indexing: PDF: Output anchor "page=n" for pages 1 to 31.
Indexing: Check for cancel during directory find all files.
About: better statistics list
Log: log IP address and Robot name
Log: log pages highlighted
Log: XML encode message
About: Keep count of robot searches separately - more info
Template: by default has meta robots nofollow and noindex
1.64 January 8, 2009 Indexing: PDF: Cope with unexpected too-large integer number
1.63 November 14, 2008 Indexing: PDF 1.5 format finally supported: cross reference streams and object streams
Indexing: PDF Flate DecodeParms Predictor 12/Up supported
Indexing: PDF \r\r line ends recognised
Indexing: Content-Type HTTP header used to override file type
Indexing: Use moved location of initial URL, eg follow "findinsite" to "findinsite/"
Indexing: Use "Content-Location" to reduce duplicate URLs indexed, eg "findinsite/" is the same as "findinsite/default.htm"
Indexing: Rules added
Highlighting: base tag and (changed) header added at better position in web page
Indexing email: server port added
Search: Load database files more efficiently
Templates: Consistently use %SEARCH_TEXT%, though %SEARCHTEXT% still supported
Output and templates: updated to use better XHTML
Output: default target supported
1.62 July 4, 2008 Indexing PDF: Finds endstream better
Highlighting: Fixed bug highlighting URL with non-standard characters
Search API: Snippet has search words highlighted using a SPAN with class hilite
Indexing: FieldsToExclude advanced option added
1.61 April 17, 2008 Indexing: Credentials now supports Integrated Windows Authentication
Indexing: Fixed bug when removing indexing from completed list
1.60 November 22, 2007 Compiled to run in ASP.NET 2.0+ web site
Search: dynamic database searching supported
Search: Highlighting of search words in results fixed for multiple subsets
Search: Field searches fixed for multiple subsets
Look and Feel: %DYNAMIC_DB% supported in header and footer
Config: Searching section new option added "Dynamic database searching regular expression"
Indexing: Cope with unusual BASE tag values
Indexing: Cope with Moved Location even better
Indexing: HardExclude advanced option added
Startup: reallySetLanguages exception handled
Indexing: PDF: Cope with format variant
Emails: sent using ASP.NET 2.0+ method
Config: Indexing From Email Password is a 'password' type input field
Search: cope with bad URL parameters better
1.51 May 8, 2007 Indexing: XLS and PPT: TextExtractor call bug fixed
Indexing: PDF and XLS: Floating point numbers identified correctly on non-English computers
1.50 December 18, 2006 Indexing: Algorithm changed to reduce memory requirement
Indexing: HTML: Cope with just 'text/html' and 'text-html' charsets
Indexing: PDF: indexing speed-ups
Indexing: PDF: Only report unrecognised encoding /Identity-H if PDF_ReportCharacterDecodeProblems set
Indexing: PDF: UnicodeEncoding bug fix
Indexing: Image: Find XMP (Extensible Metadata Platform) meta-data, eg Vista Tags
Indexing: Cope with (ie ignore) read errors
Indexing: Cope with include/exclude/robots after HTTP redirect
Indexing: Robots not case significant
Indexing: Pause every 100 files for 0.1 second
Indexing: Don't write fields or anchors if file not being indexed
Control Panel: Memory, searches and indexings counts since restart listed on About page
1.21 November 22, 2006 Indexing: Word 2007 DOCX/DOCM files supported
Indexing: Excel 2007 XLSX/XLSM files supported
Indexing: Powerpoint 2007 PPTX/PPTM files supported
1.20 September 21, 2006 Indexing: Ignore <?xml...> in web pages
Indexing: BASE tag supported
Config: Load template files in UTF-8
Highlight: Find charset more flexibly
Highlight: Fix bug if search word found in header
Language: Thai language supported
Search: Fix bug if space searched for
1.19 July 4, 2006 Indexing: Excel XLS file indexing - minor improvements
Indexing: Sections of web pages can be excluded using GoogleOn/Off and FindinSiteOn/Off comments
Indexing: URL recursion stopped using MaxURLLength, with default of 1024.
Look and Feel: displayError template supported in finderror.htt - More...
General: FindinSite image returned accurately
1.18 March 23, 2006 Indexing: Excel XLS file indexing and searching supported
1.17 February 13, 2006 Language: "Languages to Use" option added to Look and feel Control Panel
Language: Language and text direction forced to English for config page heading
Email: SMTP Mail Host option provided on Indexing config page
Email: SMTP send basic authentication password support (can be stored in Web.Config appSettings)
1.16 October 28, 2005 Indexing: Publisher PUB file indexing now supported
Language: Norwegian language file added
General: Logo and web site change and rename to findinsite-ms
General: Bug fix: Disregard include in template variable substitutions
General: Improved results sorting
1.15 July 29, 2005 Language: Bug fix: non-Western characters identified correctly
1.14 July 28, 2005 Language: Arabic (العربية) user interface added (thanks to Lubna Sorour)
Language: Arabic words now delimited by spaces etc
Language: Arabic character versions handled better (ا ى ه و)
Language: Arabic 'the' (ال) at start of word handled correctly
Language: Arabic search for 'the' by itself ignored
Language: Language files now assumed to be in UTF-8
Language: Right-to-left (RTL) languages supported using %L_HTML_TAG%, %L_BODY_TAG% and %L_ALIGN_TAG% strings in templates
Language: findinsite-ms version date localised
Language: Slovenian (Slovenščina) user interface added (thanks to Luka Malenšek)
Indexing: If Content-Type HTTP header specifies HTML charset, use this and ignore META charset.
Indexing: Try to determine HTML charset from META charset before main parse.
Highlighting: Bug fix: pages starting with UTF-8 marker bytes incorrectly recognised
1.13 June 29, 2005 Output: Extra linefeeds removed from around Included file content
Output: Included files only sent form data if included file is an .aspx
Highlight: "highlighted by" footer removed because it was not shown in the correct position by FireFox on some sites
Installation: bin dll library files renamed with phdcc.fis. prefix - be careful to delete old DLLs before installing new ones
Search API: Remaining result line variables made available
1.12 May 27, 2005 Search API: Highlight URL returned now works with FireFox
Indexing: First suggested filename doesn't have 1 appended
Indexing: Results email includes URL, File or Directory
Indexing: Search db description not saved if indexing run edited
Indexing: Report better error if image file has zero length
Search: Bug fix: crash if search db not loaded successfully
Search: Remove ? from end of search if question asked, ie if more than 1 word
Config: cope better if existing search db corrupted
Config: better on-page JavaScript handling for create new indexing
Output: Site(s) being searched added to default template using %L_SITE% and %SITES%
1.11 May 19, 2005 Config: Very first control panel has easy option to make index and search
Indexing: For charset "text/html;" assume ISO 8859-1
Indexing: Unrecognised robots tags ignored
Indexing: redirect out of directory handled better
Indexing: .php added to default HTML file types
Highlight: content-type checked better, so aspx pages work
Highlight: works for sites that use Transfer-Encoding in response header
Search: cope with apostrophes better
1.10 April 15, 2005 Indexing: Username/password supported using new Credentials advanced option (basic/digest credentials supported)
Output: Various speed ups
Output: %L_APPNAME% not made HTML-safe
1.9 April 14, 2005 Indexing: PDF and TXT indexing speed increased
Indexing: Abort mid-file implemented
Indexing: Bug fixed: slowness if AbstractWords set to 0
Indexing: Redirections off-site not reported as errors
Indexing: Minor DOC parsing fixes
1.8 April 2, 2005 Indexing: Bug fixed: page redirection timeout
1.7 April 1, 2005 Indexing: Bug fixed: page redirection
1.6 April 1, 2005 Output: Results list has snippet excerpts from each page, with search words highlighted
Output: Default template redesign
Output: Styles used in many generated HTML elements
Output: New results variables supported: file size, date, date-indexed, word-count, etc
Output: More output dates localised
Output: New language file strings supported
Indexing: More information stored for each indexed file
Indexing: If file fails Include or Exclude then it is still spidered and links followed
Indexing: UserAgent and ObeyRobots advanced options added
Indexing: <br> not added to abstract at line breaks
Indexing: web errors made more concise: no stack trace
Indexing: AbstractWords now defaults to 0, ie abstract not obtained from first words of file
Highlight: Bug fixed: highlight fails for search of *
Highlight: Copes with bad HTML better
Config: Bug fixed: Pages now counted correctly when db removed
1.5 (5.4) March 3, 2005 Indexing: Crawl-Delay throttle implemented for robots.txt
1.4 (5.4) February 22, 2005 Indexing: Page redirect works better
Highlight: Bug fixed: does not pass on "accept-encoding" header
Output: Last run output for indexing in progress has better message
Output: Default result logo updated
Licensing: All starts logged at phdcc.com
1.3 (5.4) February 10, 2005 Indexing: robots.txt supported
Indexing: Cookies maintained throughout each indexing run, saving session state
System: Fix initialise security exception on some shared hosts
1.2 (5.4) February 2, 2005 Highlight: Highlight of hits in HTML pages; highlight configuration options added
API: Search API updated to add HighlightURL to each returned result
API: Search API bug fixed: GetFieldNames() causes exception if no fields available
Indexing: FindInSiteBot user-agent HTTP header added to indexer, referring to robots bot page,
Indexing: Various PDF indexing fixes
Indexing: REL="nofollow" supported in A tags
1.1 (5.4) January 4, 2005 Release


Possible problems

  • Email: Note that not all hosts support email from ASP.NET programs.
  • Highlight: In a very small number of cases, pages do not show correctly when findinsite-ms highlights across domains - more details.
  • Indexing: Running Visual Studio.NET may cause findinsite-ms to hang while indexing. Technical: when a page has been redirected, a hang sometimes occurs when the HttpWebResponse Close() method is called.

Known bugs

  • If you edit an indexing run that has not been run, then indexing information lost
  • https: access may go wrong

Possible improvements

  • Search: for specific search display specific pages
  • Install: .msi installer
  • Indexing: Handle larger sites
  • Indexing: Indexing depth option
  • Indexing: Split include into include_filter and include_index. The same for exclude.
  • Indexing: Configurable start and end abstract indicators
  • Indexing: Provide site report, eg add META DESCRIPTION, META DESCRIPTION all the same, etc
  • Indexing: Show indexing problem count as indexing is running
  • Indexing: support robots noarchive
  • Indexing: Last run output in more helpfule format, eg CSV or XML
  • Log: Option to log hits shown using highlighter
  • Output: Provide better information when cross-site scripting parameter attacks stopped, ie provide better response for < > etc
  • Output: Indicate file type for each result eg [PDF]
  • Output: Provide user with option to show more details for each link, eg %ALLTEXT%
  • Output: Somehow make HTML optional, eg do not include following <br> if abstract empty
  • Output: Provide parameters for %SNIPPET%
  • Output: Make complete set of template variables available as Include file variables
  • Highlight: header of highlighting page: could add found X words on page, to refine search, etc
  • Highlight: provide separate colours for each word
  • User: Results per page option for user
  • User: Advanced search: results design option for each user, ie choose which elements to show - stored in cookie
  • General: Image db containing thumbnails
  • General: Cookies turned off support
  • General: Provide a DotNetNuke DNN module to interface to an external instance of findinsite-ms
  All site Copyright © 1996-2018 PHD Computer Consultants Ltd, PHDCC   Privacy  

Last modified: 2 May 2018.