The minimum configuration for a ViaWorks server:
Fresh Installation | Upgrade | |
# CPU Cores | 4 | 4 |
RAM (GB) | 16 | 16 |
Available Disk space (GB) - System Drive | 5 | 5 |
Available Disk space (GB) - Program Files | 2 | 2 |
Available Disk space (GB) - Program Data | 60 | 5 |
NOTE: These are the minimum required server specifications to install ViaWorks and are only sufficient for very small installations. The majority of installations will require higher specifications.
The disk space requirements, shown in bold italics, are absolute required minimums. The installation will not proceed if these are not met. For CPU and memory, the system will display a warning message if the actual values are below these minimums, but the installation will allow you to continue. However, poor performance and database timeouts can result from a configuration that has too little memory, or too few CPU cores.
VirtualWorks also recommends designating the ViaWorks Program Data to install on a dedicated disk drive, different from the drive on which the OS is installed.
The recommended configuration will depend upon the customer's environment. To determine the appropriate configuration, one must consider the two different periods of ViaWorks indexing: Initial indexing, and Incremental indexing.
Initial Indexing
Initial indexing is the period right after all of the data connections have been configured and connection schedules have been established, during which ViaWorks first discovers all of the documents in each of the repositories, converts all the documents' content, and stores the data in the ViaWorks index. During this time it is recommended that users do not perform searches as ViaWorks will be consuming almost all of the CPU for indexing operations, leaving little processing time for searches.
Volume, Type, and Size of Documents
The size and type of individual documents, as well as the total number of documents will contribute to both the CPU and the memory requirements. Large, text-rich, multi-page documents require more system resources than smaller documents with little text. Files that require Optical Character Recognition (OCR) conversion, like JPG and GIF image formats, and TIF and PDF files with embedded images, will consume much more of the CPU cores than files that do not require OCR conversion. If these files are large and contain many pages, this will increase the CPU consumption even more and will require a server configuration with more CPU cores to achieve acceptable indexing performance. Adding additional ViaWorks servers to assist with the fetch and conversion processes during the initial indexing period may be required to complete indexing within the desired time frame.
Incremental Indexing
Incremental indexing is the period after initial indexing has completed, where indexing is being performed on only new and changed documents, and users are performing searches.
Index Size, Document Turnover, and Concurrent User Searches
A large index, a high volume of new and changed documents each day, and a large number of users performing concurrent searches will increase the recommended minimums for # CPU cores and RAM. For example, for an index containing several million documents, the ViaWorks server may require at least 32GB RAM to ensure acceptable performance and to avoid timeout errors. If the number of concurrent users is very high, this will further increase the memory requirement. It is during the incremental indexing period where it is determined that additional ViaWorks servers that may have been necessary during initial indexing, are no longer needed.
Summary
There are too many variables to provide an exact recommendation for a server configuration. However, the following are general guidelines:
- The larger the index, the more memory is needed.
- The more concurrent users performing search, the more memory is needed.
- The greater the number of total documents (for initial indexing), or the greater the number new and changed documents each day (for incremental indexing), the greater the processing power (# CPU cores and processor speed) is needed.
- The greater the number of documents requiring OCR conversion, and the greater the number of pages in these documents, the greater the processing power (# CPU cores and processor speed) is needed.
The amount of available disk space for Program Data will depend upon both the amount of data indexed, and the projected rate of growth for new documents. Additionally, it is important that adequate disk space is available for product upgrades. ViaWorks includes PostgreSQL, an open source object-relational database system. There will be occasions when a ViaWorks service release, or a new product release will include an updated version of PostgreSQL. When this occurs, the upgrade process will require enough space to temporarily copy the database. This means that there needs to be available disk space in the amount of approximately 2 1/2 times the size of the database. A good practice would be to make sure there is always plenty of available disk space.
Disks
The disk drives are important for the overall ViaWorks Server performance. While performing indexing, the disk read/write loads will periodically be very heavy, thus fast disk I/O is important. During indexing, ViaWorks is continuously reading and writing to and from the database, the index, and various log files.
Because of the large amounts of data and the frequency of disk read/write operations, selecting a storage system with high I/O speed will greatly increase the performance of the ViaWorks server, especially when it come to searches.
Hard Disk Drive (HDD) or Solid State Disk (SSD)
A Solid State Disk (SSD) offers outstanding I/O performance. Although the storage will be more expensive than standard disk systems, SSDs greatly increase the ViaWorks Server indexing performance, and all I/O operations. This is especially important regarding user search time performance.
CPU
ViaWorks indexing is CPU intensive. Indexing that requires OCR processing is even more CPU intensive. The ViaWorks Server is tuned to use as much of the available CPU resources as possible. The faster the processors, and the greater the number of CPU cores, the faster your ViaWorks Server will perform. It is important to know that not all CPUs provide the same level of performance. They differ in speed (measured in GHz), the number of cores, and the amount of L2 and L3 cache, among other differences. Therefore, it is important to configure a server with both the appropriate number of CPUs as well as the specific type of CPU.
Memory
Virtual Environments
If you choose, you can install the ViaWorks Server in a virtual environment. By using a virtual machine, you can allocate more virtual resources as they are needed, such as during the initial indexing period. The virtual resources can then be reduced to a lower level during "steady state" operations when performing scheduled incremental indexing.
See also