Table of Contents |
---|
...
The objective of this document is to provide an architectural overview of all ayfie Locator.
For a description on how to deploy the same architectual components discussed in this document across a multi-node installation, continue with Introduction to Multi-Node Installations upon finishing reading this document.
...
In the following sections we will go through all these components one by one and outline their functions.
Connectors
Connector Types
The ayfie Locator has a large number of connectors that support a wide variety of data sources. Out of the box only 3 of these (For Locator versions older than 2.11, the File Server, Exchange and SharePoint ) are part of the Locator installation. The rest of them connector were preinstalled. As of version 2.11, all connectors have to be installed individually.
...
There are two types of connectors, DBC DB connectors and non-DBC DB connectors (the later is also referred to as Rapid connecotrsconnectors). All DBC DB connectors share the same implementation and only differ in what SQL statements that are being used. The SQL statements are defined in a set of XML files as described in GenericSQL Connector: Adding a New Connection. This is what is illustrated with the files tagged with db 1 to db N to the left in the figure above.
The non-DBC DB connectors on the other hand are implemented one by one by ayfie using the ayfie internal Connector SDK on the ayfie side and normally some target system provied system provided API on the other. Each connector ends up as one or more DLLs as illustrated to the bottom right above. The GenericSQL Connector is is actually nothing more than just another such SDK developed connector, the difference being that it has been implemented to take the SQL commands to use from a configuration file rather than from within its code.
Data Extraction
The data extraction performed by the connectors are is a two phase operation. The first phase is called discovery and only meta data such as for instance file name, file size, etc is collected and indexed during this phase. The next phase is called fetch and in this phase the rest of the data, the actual textual content, is retrieved, processed and indexed.
...
- The connector retrieves document information , that is such as the id/path/file name and the document type (file extension) as well as ACLs (Access Control Lists).
- It is possible to add an extension DLL with customized operations to be performed on the meta data for each document.
- The connector stores information about any document of a "qualifying file type" in the database. It also add the document to the document fetch queue
- The index builder detects new or updated rows of data (check time stamps) and stores the name and type of the document (but not the content that has not yet been retrieved)
The fetch phase consists of steps 5-10 described below: - The connector SDK retrieves the document path/id/etc from the document fetch queue and passes this on to the relevant connector
- The connector downloads the document
- The same external custom DLL as in step 3 (if any) is now run on the fetched document content
- The content is extracted by one of the converters (for instance OCR conversion takes place here)
- The connector stores the converted content in the database under the doc id created during discovery (or earlier)
- The index builder picks up and indexes any new or updated documents from the database
How often these two phases are carried out is configurable on a conenction by connection bases via the Locator GUI. The default is continuously. That is, as soon as one round with the two phases has completed, a new round is started.
Converter Services
All files, regardless of file type or file format, are passed on to the DB by some Connector as shown below. First the file is copied from the file's location at the source to a temp folder managed by the Convertor Services. In addition to the copy operation, the Connector will also provide the Converter services with the full file name. The Conversion Services will use the file name extension to determine which converter to use. Currently we have three converters available of which two, OmniPage and Tesseract, are Optical Character Recognition (OCR) converters and are used for converting scanned documents. These two converters are never used together as Locator is either configured to use one or the other. All other files are handled by the DocFilter Converter.
...
The job of the Index Builder is to keep the search index in sync with the documents that are stored in the database. Hence, any added, deleted or altered document in the database will be detected by the Index Builder upon which it will do a re-index. The way the index Builder detects the changes is by monitoring two specific database tables (doc.document and the doc.document_tombstone) for any added, altered or deleted documents. If it finds any row with a time stamp newer than more recent than it's last visit, it will process that row.
...
The SharePoint App Plug-In
The ayfie Locator Application for SharePoint is an add-on package to SharePoint that allows users of SharePoint to search with ayfie Locator from within the SharePoint GUI.
Data Enrichment Services (Rules Engine)
...