Web Connector Release Notes

Version 2.8.17

June 03, 2022, Supported Locator version(s): 2.11

Bugs Fixed

  • CFD-4637 Web connector - Pages with robots meta tag "noindex" are indexed

Version 2.8.16

May 10, 2022, Supported Locator version(s): 2.11

Bugs Fixed

  • CFD-4615 Web connector - not supporting robots meta tag "noindex"

Version 2.8.15

March 18, 2022, Supported Locator version(s): 2.11

Bugs Fixed

  • CFD-4300 Web connector - too many threads are being created by fetch service
  • CFD-4521 Web connector - slow fetch


Version 2.8.14

September 03, 2021, Supported Locator version(s): 2.10, 2.11

Bugs Fixed

  • CFD-4301 Web connector - the crawler is crawling content that should be ignored by it's settings

Tasks Completed

  • CFD-4271 Web connector - add support for TLS 1.2 in preview plugin


Version 2.8.13

August 09, 2021, Supported Locator version(s): 2.10, 2.11

Bugs Fixed

  • CFD-4264 Web connector - fetch job is using different CrawlDecisionMaker than discovery.
  • CFD-4257 Web connector - invalid redirects handling in fetch job
  • CFD-4250 Web Connector - connector is constantly deleting items per run


Version 2.8.12

July 16, 2021, Supported Locator version(s): 2.10, 2.11

Bugs Fixed

  • CFD-4238 Web Connector - custom plugins are loaded from default directory path if you specify incorrect directory path
  • CFD-4230 Web Connector 2.8.11 does not crawl documents
  • CFD-3956 Web Connector - Attempts to parse PDFs as HTML


Version 2.8.11

May 07, 2021, Supported Locator version(s): 2.10, 2.11

Bugs Fixed

  • CFD-4148 Web connector - Saving pages and links to the disk is not working
  • CFD-4147 Web Connector - The crucial objects like WebCrawler are not releasing unmanaged resources and managed objects.

Tasks Completed

  • CFD-4132 Web Connector Parser Plugin - optimize memory usage
  • CFD-4130 Web connector - update Abot libraries



Version 2.8.10

February 12, 2021, Supported Locator version(s): 2.10, 2.11, 3.0

Improvements

  • CFD-4096 Web Connector -Improve handling exceptions regarding custom plugin

Bugs Fixed

  • CFD-4095 Web Connector - Unassigned configuration property "CurrentSeedUrl"



Version 2.8.9

November 25, 2020, Supported Locator version(s): 2.10, 2.11, 3.0

Improvements

  • CFD-3751 Connectors - Add metadata text and document text fields merging rule

Bugs Fixed

  • CFD-3956 Web Connector - Attempts to parse PDFs as HTML



Version 2.8.8

June 23, 2020, Supported Locator version(s): 2.10, 2.11, 3.0

Bugs Fixed

  • CFD-3844 Web Connector - The sitemap configured for Web Connector appears in search results as a document
  • CFD-3830 Web Connector - Fetches some of the documents with incorrect title
  • CFD-2259 Web Connector - Crawler not indexing links
  • CFD-1896 Web Connector - Connector not keeping crawler state



Version 2.8.7

April 29, 2020, Supported Locator version(s): 2.10, 2.11, 3.0

Bugs Fixed

  • CFD-3742 Web Connector - Should not access HTML properties for non-HTML items

Tasks Completed

  • CFD-3423 Move Web Connector to Rapid



Version 2.8.6

Bugs

  • CFD-2756 Web Connector - Not all pages removed from sitemap are removed from index



Version 2.8.5

Tasks

  • CFD-2750 Publish Connectors to Connectors 2.9 Feed based on SDK 1.5
  • SDK-280 Publish Connectors to Connectors 2.10 Feed based on SDK 1.6



Version 2.8.4

Improvements

  • CFD-2574 Web connector - Add Platform Date
  • CFD-2146 Web Connector - Add support for new hit fields in RestService version 6
  • CFD-2654 Universal Web connector - proxy support



Version 2.8.3

Bugs

  • CFD-2435 Web crawler - 301 redirect links that shouldn't be index are indexed anyways.
    (To avoid indexing redirects - set IsHttpRequestAutoRedirectsEnabled to False)  
  • CFD-2434 Web crawler does not handle robots with ending /.
    (Bug confirmed and reported to Abot - the third party crawler. Need to be fixed temporarily with configuration setup changes)
  • CFD-2367 Web - The sign '?' working as designed in robots.txt Disallow. 
    (Bug confirmed and reported to Abot - the third party crawler. Need to be fixed temporarily with configuration setup changes)
  • CFD-2228 Web connector - Canonical URL not working
  • CFD-2078 Web connector - Only tries to crawl with protocol TLS 1.0

Tasks

  • CFD-2314 Web Connector - Release with new branding



Version 2.8.2

Improvements

  • CFD-1758 Web connector - Expose and add setting "IsRespectHttpXRobotsTagHeaderNoFollowEnabled" and 3 other missing config values

Bugs

  • CFD-1953 Web connector crashed after 5 days of discovery
  • CFD-1894 Web connector - crawler indexes "http://virtualworks.com" when specifying "http://virtualworks.com/contact" as seedurl
  • CFD-1817 Web connector - meta robots = "nofollow" not working
  • CFD-1771 Web connector - "The directory is not empty" when crawling

Tasks

  • CFD-1954 Web connector - Investigate high cpu/disk/memory usage
  • CFD-1932 Web connector - Release version 2.8.2
  • CFD-1914 Web connector - Crawl of single page not checking for canonical



Version 2.8.0

Summary

  • Hidden settings now added to the Admin Wizard and database
  • New custom setting for only building pages where the rel canonical link is equal to the page url (this is turned off by default)
  • Bug fixes and update the third party crawler api. 

Bugs

  • CFD-1813 Web connector - Bugs in admin customs settings.
  • CFD-1812 Web connector - Won't remove page after e.g. reducing crawl depth
  • CFD-1366 Web connector - Unable to add new settings in wizard

Tasks

  • CFD-1884 Web connector - Update the Abot Crawler
  • CFD-1876 Web connector - Add "tracking" to detect deleted/excluded pages
  • CFD-1810 Web connector - Publish Data Sheet
  • CFD-1759 Web connector - Add feature to only crawl canonical url's
  • CFD-1797 Web connector - Release next version



Version 2.7.8

To enable preview for web pages, please change the REST service's web config. Add "html" to the extension list for the document previewer, like this

<add AppName="document" Action="preview" Script="DisplayLink" ExtList="txt;doc;docx;dotx;docm;docxm;dot;pdf;cs;css;js;fax;xml;xls;xlsm;xlsx;xlsxm;xlt;xltm;xltx;xps;msg;html" DocTypeList="" SkipRootExtList="" SkipRootDocTypeList="" Priority="20"></add>

Features

  • CFD-1251 Web connector - Add support for preview of files



Version 2.7.6

Task

  • [CFD-1250] - Web connector - Filename and filext have semi-colon on the end



Version 2.7.5

Bug

  • CFD-1240 - Not all specified MIME types were downloaded.



Version 2.7.4

Bug

  • [CFD-538] - Web connector: Deployment issue - missing authentication plug-in file
  • [CFD-539] - Web connector: AuthRealm is required to be configured
  • [CFD-784] - Some of the web links aren't crawled

Task

  • [CFD-540] - Web connector: Change SDK version from 1.1 to SDK 1.2

ayfie