How to disable deduplication features

Locator normally identifies duplicates of already processed documents to reduce the time and resource usage due to running fetch and conversion tasks on items that are already in its' database. However, in older versions of Locator (prior to 2.11 SR7), the performance of this feature can sometimes suffer in larger installations - in extreme cases, the SQL queries used to identify the duplicates simply time out and prevent the document from being processed at all.

In that situation, you may want either upgrade to 2.11 SR7 or, if that is not feasible or you need a quick workaround, to disable the deduplication for one (possibly all) of the connectors on your system.

This setting can also be used if you want to to a “real” refetch of a repository without having to care about the hash.

To do so:

  1. Find the Via.<connector name>.Fetch.Service.exe.config for the connector you want to disable the deduplication for

2. Edit the config file - it’s an XML file and can be edited with any text editor of choice.

3. Add the following entry to the <appsettings> section of the config file:

<add key="DeduplicationEnabled" value="false"/>

Example:

Original file:

After disabling deduplication:

4. Restart the connector’s Fetch Service.

Please note: Disabling deduplication will potentially increase the overall resource usage by the fetch and conversion processes running on your Locator server, as more documents would be subjected to the full fetch and conversion process.

ayfie