Introduction
The objective of this document is to provide an architectural overview of ayfie Locator.
For a description on how to deploy the same architectual components discussed in this document across a multi-node installation, continue with Introduction to Multi-Node Installations upon finishing reading this document.
The graphic below shows all ayfie Locator components at the highest conceptual level. Gray color indicate third party components. Dark blue shows important internal or external APIs.
In the following sections we will go through all these components one by one and outline their functions.
Connectors
The ayfie Locator has a large number of connectors that support a wide variety of data sources. For Locator versions older than 2.11, the File Server, Exchange and SharePoint connector were preinstalled,.As of version 2.11, all connectors have to be installed individually.
There are two types of connectors, DB connectors and non-DB connectors (the later is also referred to as Rapid connecotrs). All DB connectors share the same implementation and only differ in what SQL statements that are being used. The SQL statements are defined in a set of XML files as described in GenericSQL Connector: Adding a New Connection. This is what is illustrated with the files tagged with db 1 to db N to the left in the figure above.
The non-DB connectors on the other hand are implemented one by one by ayfie using the ayfie internal Connector SDK on the ayfie side and normally some target system provied API on the other. Each connector ends up as one or more DLLs as illustrated to the bottom right above. The GenericSQL Connector is actually nothing more than just another such SDK developed connector, the difference being that it has been implemented to take the SQL commands to use from a configuration file rather than from within its code.
The data extraction performed by the connectors is a two phase operation. The first phase is called discovery and only meta data such as for instance file name, file size, etc is collected and indexed during this phase. The next phase is called fetch and in this phase the rest of the data, the actual textual content, is retrieved, processed and indexed.
Below we see a graphical presentation of the two phases:
The discovery phase consists of steps 1-4:
- The connector retrieves document information such as the id/path/file name and the document type (file extension) as well as ACLs (Access Control Lists).
- It is possible to add an extension DLL with customized operations to be performed on the meta data for each document.
- The connector stores information about any document of a "qualifying file type" in the database. It also add the document to the document fetch queue
- The index builder detects new or updated rows of data (check time stamps) and stores the name and type of the document (but not the content that has not yet been retrieved)
The fetch phase consists of steps 5-10 described below: - The connector SDK retrieves the document path/id/etc from the document fetch queue and passes this on to the relevant connector
- The connector downloads the document
- The same external custom DLL as in step 3 (if any) is now run on the fetched document content
- The content is extracted by one of the converters (for instance OCR conversion takes place here)
- The connector stores the converted content in the database under the doc id created during discovery (or earlier)
- The index builder picks up and indexes any new or updated documents from the database
How often these two phases are carried out is configurable on a conenction by connection bases via the Locator GUI. The default is continuously. That is, as soon as one round with the two phases has completed, a new round started.
Converter Services
All files, regardless of file type or file format, are passed on to the DB by some Connector as shown below. First the file is copied from the file's location at the source to a temp folder managed by the Convertor Services. In addition to the copy operation, the Connector will also provide the Converter services with the full file name. The Conversion Services will use the file name extension to determine which converter to use. Currently we have three converters available of which two, OmniPage and Tesseract, are Optical Character Recognition (OCR) converters and are used for converting scanned documents. These two converters are never used together as Locator is either configured to use one or the other. All other files are handled by the DocFilter Converter.
Database
By default, ayfie Locator comes with a PostgreSQL database. ayfie Locator can also be set up to use Microsoft SQL Server. The motivation for some customers to choose to replace PostgreSQL with Microsoft SQL Server is either performance or stability reasons, or simply unfamiliarity with the PostgreSQL database.
The database is used to to store the incoming data in its original form, including meta and security data (ACLs). It is also used for storing configurations, output from running analytics, user settings and more.
Index Builder
The job of the Index Builder is to keep the search index in sync with the documents that are stored in the database. Hence, any added, deleted or altered document in the database will be detected by the Index Builder upon which it will do a re-index. The way the index Builder detects the changes is by monitoring two specific database tables (doc.document and the doc.document_tombstone) for any added, altered or deleted documents. If it finds any row with a time stamp newer than more recent than it's last visit, it will process that row.
Lingustic Services
Linguistic Service (a.k.a Lingo) is a module that does not come with the out of the box ayfie Locator installation, but has to be installed seperately. It's purpose is to extract entities from the incomming data and use to populate fields created and dedicated to that those particular values. The Linguistic Service is utilized by the Index Builder.
Index Service
By default, the ayfie Locator comes with a SOLR search engine that is pre-configured with 3 shards. A shard is a index fragment. Any indexed document will be placed in one shared only. The motivation for using 3 shards is that this has been found to give the best overall performance trade off between indexing and search for a single node Locator installation.
These are the numbered steps above:
- Incoming documents are indexed by the Index Builder and passed on to SOLR.
- SOLR consults with Zookeeper to know in which shard to place each document.
- At some later time there is an incoming search query that is passed in from the IIS contained Rest Service to SOLR.
- SOLR again consults with Zookeeper. This time to know which shards to search within. For a single node installation, this step has no added value as all three shards needs to be searched and the 3 results merged into one. However, for a multi node installation and/or a installation with failover, this last step is crucial for the operation.
Query & Result Processing
Below we see how an incoming query and the corresponding search result propogate through Locator:
- The user sends in a query, be it via the ayfie Locator front end or some other application (we have in this case used SharePoint as an example). The query is passed in via the Search API.
- The user is is authenticated and identified. This is done by an ayfie or a customer developed plugin and is normally done towards Microsoft Active Directory as well as often towards one or more target source systems.
- The original query is expanded with user's ACLs that was obtained in the previous step.
- The ACLs of the items in the search result is compared to the user's ACLs obtained in step 2 and all items to which the user does not have access is removed from the result
- The search result is modified according to the rules stored in the search result rule engine
- The ayfie, custom or third party front end presents the result and provide item preview (via an ayfie plug-in)
- If the end user has downloaded the ayfie Document Handler, any result clicked will be opened in its native application
The ayfie Document Handler
The ayfie Document Handler is a native ayfie utility for Windows that is installed separately from ayfie Locator and is used to open links in the ayfie Locator search result using the relevant Microsoft Office or other supported application. Without this tool the end user will have to first download the file for then next to open it using the same relevant application.
The SharePoint App Plug-In
The ayfie Locator Application for SharePoint is an add-on package to SharePoint that allows users of SharePoint to search with ayfie Locator from within the SharePoint GUI.
Data Enrichment Services (Rules Engine)
It is possible to use the Rules Engine to alter the out of the box behavior of ayfie Locator. There are two places where the rules of the Rules Engine come into play:
- The Index Builder - The Index Builder can be set up with rules for how to handle data in transition from the database to the search index. The rules can be for document fields to be removed, replaced, added, altered, or merged.
- The Rest Services - The Rest Services can be set up with rules that alter the outgoing results.
Here are the steps:
- The ayfie Locator administrator uses the Dashboard web site to create or update rules
- Index side rules are uploaded to the Index Builder
- Query and result side rules are uploaded to the Rest Service
- Data is fed and the rules are consulted by the Index Builder during indexing
- The user passes in a query
- The search result is processed based on Query Result Rules before return the user
License Service
The ayfie Locator license service is consulted before at connector startup (# 5 in graphic below) and user login (# 6).
The numbers above refers to these steps:
- The customer enters their ayfie provided customer id and customer ID via the Management Console which hands this information over to the License Service.
- The License Service uses the ID and the key to consult over the internet with the ayfie License Server.
- The ayfie License Server uses the ID and the key to look up the corresponding license file and return it to ayfie Locator.
- ayfie Locator stores the license file in the database
- Whenever a connector is to do any type of operation, it checks with the License Service if it is OK to run.
- Before any search query is executed, the License Server verifies that the number of active users has not been exceeded.
If Locator is off-line and is not able to connect with the ayfie License Server, then one can instead receive a license file specific to the customer's server directly from ayfie and use the file option when entering the license information instead of the ID and key