Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Introduction

There can be several reasons for running Locator on more than one server:

  • Too much data to process, index and store on a single server
  • Too many incoming search queries for a single server to handle
  • Too slow search query processing due to a too large index
  • No failover with just a single server
  • Initial feeding and indexing takes too long
  • Operation latency due to a single job queue

This document aims at giving an understanding of what a multi-node installation entails. For detailed step by step procedures, consult the documentation under Multi-Server Environments.

This document assumes that one are familier with all the components of a single node ayfie Locator installation. If not, then first read ayfie Locator Architectural Overview..

Single Node Installation

All components of a single node installation is described in detail in ayfie Locator Architectural Overview.

Below we see a single node installation where all ayfie Locator components reside on the same host.

A single node installation does not support failover and the only way to increase capacity is by scaling up (that is, adding more resources to the host). As a rule of thumb, given a normal document type distribution (70 % email / 30 % office documents), a single node can handle up to 20 million documents. The table below shows the amount of RAM, CPU and disk required depending on the amount of documents (the right most column with the resource requirements for an extra fetch server is covered in the next section):


LocatorLocator & SupervisorExtra Fetch Server
Docs in Millions< 20< 20< 20
CPU cores4 - 168 - 244 - 16
RAM (GB)16 - 6432 - 9616 - 64
System Drive (GB) 202020
Program Files (GB) 606060
Program Data (GB)200-500250-60060

Those of the numbers that are given as a range all corresponds to each other. That is, if the numer of document is in the lower, middle or higher end of its range, then so are the numbers of CPU cores and amount of RAM as well within their respective ranges.

Secondary Fetch Server

The step by step procedure for the scenario described in this section, is found at Installing/Configuring an Additional Fetch Server.

Document fetching is a very CPU intensive acitivity of which the binary to text conversion is the biggest consumer, especially OCR conversion. For more information about document conversion in particular, check out Document Conversions & Estimating Initial Indexing Time.

The single node described in the previous section would normally be able to handle up to 20 million documents. However, the initial feeding, processing and indexing of that amount of data may take a very long time to complete. Depending on the size of the documents and the proportion of them being office documents that requires conversion, 20 million documents may very well take some 3-4 months. To speed up this initial upload of data, an additional fetch node can be applied as shown in the next graphics. This will add much more CPU to the operation.

The use of a secondary fetch server can either be limited to the initial round of data uploading or made permanent. In the latter case one can possibly turn off the fetching on the primary server and in that way ensure that document fetching does not have any negative impact on the query response time.

The resource config for such a secondary fetch server is shown in the right most column of the table in the previous section.

Secondary Search Server

The step by step procedure for the scenario described in this section, is found at Web Server and Solr Cloud Configuration.

Search is a RAM intensive operation and the all search node must thus be set up with enough RAM.

Here we have added a secondary server to and by that doubled the amount of documents that we are able to handle from 20 millions to 40 millions.

Everything in the table above remain the same except for the very first row:


LocatorLocator & Supervisor
Docs in Millions< 40< 40

All the other rows in the previous table are per host. The document capacity repeated here is for the installation as such.

The previous graphic only showed the high level components. In the next graphic we will have peek inside the Index Services to see how documents and queries are being distributed between them:

The first thing to decide when configuring the Zookeeper, is whether one is to support failover or not. If there is to be no failover support then one can remove (= not configure) all the blue replica shards.

All incoming documents are indexed by the Index Builder and handed over to the SOLR instance on the Primary node. The SOLR node consult with ZooKeeper about which shard and replica to use for what and then passes the document between the SOLR node based on this and then place the documents into the right shards/replicas. On the query side it is almost the same with the one difference that the Rest Service, unlike the Index Builder, always interact direcely with the SOLR instance on the same node.

Large Multi-Node Setups

Below we see the standard node layout for a Locator deployment with more than 40 million documents to be indexed.

The difference here relative to the two previous examples is that we now have also placed the database on one server of its own. Here is the resource requirement for the various servers in this set-up:


Database Node

Locator Primary Node

Locator Secondary Search NodeLocator & Supervisor Primary NodeLocator & Supervisor Secondary Search NodeExtra Fetch Server
Docs in Millions20 - 10020 - 10020 - 10020 - 10020 - 10020 - 100
CPU cores32

12

Next Table24Next Table32
RAM (GB)100

100 - 128

Next Table128 - 148Next Table64
System Drive (GB) 2020Next Table30Next Table20
Program Files (GB) 6060Next Table60Next Table60
Program Data (GB)1500 - 3000800 - 1500Next Table1000 - 1700Next Table100

Index Capacity, Query Performance & Failover

As the number of documents and/or the incoming query traffic increases, one single search node  will not be enough. Below we see how we can add columns in order to increase data capacity and rows in order to increase query performance.

Each search index positioned in one (horizontal) row has a different set of data than the other indexes in the same row. For instance a row with 2 columns will double the amount of data that can be indexed. All search index residing within the same column will all have a copy of the same set of data. A set up with let's say 3 rows will thus triple the amount of queries Locator is able to respond to.

If one adds one more row than what one needs in order to handle peak query traffic, then that extra redundant row will ensure total failover with no drop in quality if any one search server were to go down.

Below we see a 2 x 2 search matrix example. Each search row is marked with a yellow rectangle and each search column with dark red rectangle.

And the table below has the machine spec for each of the 3 search nodes that has been added:


LocatorLocator & Supervisor
Docs in Millions20 - 10020 - 100
CPU cores1224
RAM (GB)100128 - 148
System Drive (GB) 2030
Program Files (GB) 6060
Program Data (GB)800 - 15001000 - 1700

Single Points of Failure

With the current version of the ayfie Locator, the following components cannot be duplicated and remains single point of failures:

  • Index Builder - The Index Builder is stateless, hence the worse case scenario is an temporary interruption in indexing of new and updated documents.
  • Linguistic service - The Linguistic Services (a.k.a the Lingo Services) is not part of the out of the box installation. If installed later, it will be called by the Index Builder to perform entity extraction upon data extracted from the database. Failure by this component will slow the Index Builder down and it will take it longer to produce the index as it will be waiting for a timeout for each document, and no entities will be extracted. 
  • License Service - A License Service failure will be fatal. Without the license server, connectors will stop to operate and no user will be allowed to pass in queries.
  • Primary Search Node - If one operates with a search matrix to provide failover, then that will will work as long as the failing search node is not the primary server. The reason for this is that the license server resides on the primary node and if that node goes down, no Rest Service will be able to authenticate the user.
  • Zookeeper - A Zookeeper failure will render the search nonoperational as ZooKeeper is responsible for directing searches to the correct shards within the collection for the search node in question.
  • IIS - If the Windows IIS Web Server were to go down then it will no longer be possible to post queries. However, as that service is stateless, as soon as it is up again, it would be back to business as usual as no data would be lost beyond the failed queries posted while the service was down. Using a load balancer and two or more IIS node as shown in the next section, would not eliminate this single point of failure as both/all of them would be depending on the license server which would continue to be a single point of failure.
  • Database.- If the database were to go down all searches would stop working as the required user authentication depends on the database. However, it is possible to set up an external standalone PostgreSQL or Microsoft MySQL database with its own failover solution independent of the ayfie Locator.
  • Connector services - The connectors have two phases, discovery and fetch. Unlike fetch that can by done from any number of servers, discovery can only be done from a single server. For collections served by a connector performing discovery from a particular server, the failure of that server / that connector will result in failure to do any further updates to the index. The server assigned to perform discovery can be changed, however only 1 server can be assigned to perform discovery.


  • No labels