/
Rules, Refiners & Primary Details Tutorial

Rules, Refiners & Primary Details Tutorial

Introduction

This objective of this tutorial is to teach the readers how to go about creating

  • Rules Engine rules

  • Refiners (also referred to as search result filters)

  • Primary Details and More Details in the search result

In regards to rules one should note that the empeses in this tutorial is on how to use rules and not so much on how to go about implementing them. There is a relative steep learning curve for creating rules that will be covered in details in the yet to be publised Advanced Rule Tutorial.

Ayfie Saga Architectural Overview describes the overall Ayfie architecture. Below we have copied the graphics that shows the part of that architecture that relates to the Rules Engine:

We notice the following:

  • Rules are added, removed, modified by a system admin via the Locator Dashboard UI (blue step 1)

  • There are two types of rules, index side rules and search side rules (the two occuerrences of red step # 2). The former alters the documents before indexing and the latter alters the search result before it is passed on to the end user.

Prerequisites

The prerequisite is to have a Locator installation up and running and in this section we will create and index the first test document that we will be using in this tutorial. Later documents will be created and indexed the same way.

Create the First Test File

Create directory C:\test_data and within it create a text file named test document.txt with this content:

This is a test document

Index the First Test File

Install the File Server Connector as instructed in section Connector Management of the Ayfie Locator Installation Guide.

Once installed, create a new connection with the test_data directory we created above with our one test file as the data source:

Search for the First Test File

Search for the file to verify that it has been successfully indexed and is searchable:

Accessing the Rules Engine UI

The Rules Engine menu bar option is normally not visible. To visualize it, add the URL key-value pair se=1 to the end of the URL as shown here (the key-value pair has to be preceded by a ? or a &):

Below we have superimposed our architecture graphics from above upon the Rules Engine UI to drive home the point that the Rules Engine consists of two separate rules set, one for the index side that is carried out before the documents are indexed, and another one for the search side that is carried out before the result is passed on to the end user:

Depending on which of the two tabs one select (Index or Search), one will see a list of all existing rules of the selected type:

If one select one of those rules (blue arrow) one will then see the XML rendering of the selected rule appearing just below it.

Rule “Validation”

In later sections we will be entering the Index and Search pipeline to create new rules. In this section however, we will only be looking at rules that already exist. To do that we use the Validation option that is to be covered in this section.

Obtaining a Document ID

It is not possible to use the Rules Engine without having the ID of an already indexed document. There are several ways of obtaining an document ID, but the easiest one is just to search for the document after having configured the search page to also show document IDs as we have done here:

Notice how we have added the URL parameter key-value pair debug=true and then selected the card view option for how to show the results. The detail view alternative will also work, but not the default list view as that view does not have the space required for displaying the ID.

Looking up Document Meta Data

To the left below we use a source reference ID and to the right a document ID, to look up information about indexed data. A document ID refers to the ID of for instance an individual file whereas the source ID would refer to for instance the directory that the file resides in as is the case below. However, it could also be a zip file or an email that the file is contained by or attached to. The easiest way to look up a source reference ID is to first look up a document by its ID using the method described in the previous section. We can then obtain the source reference ID on the first line under the ATTRIBUTES heading (see either of the two screenshots below). In the rest of this tutorial however, we will only be using the document ID.

Looking at the “Before Rules” Document Content

To see all the key-value pairs of a document before any rule has been executed on the index side (red option) or the query side (blue option), enter the document ID into the Validate field and then click the Before Rules option (the two green circled UI components):

Looking at the Changes Caused by the Rules

By now clicking the Document option (right most green circle), we can see all the changes that have been carried out based on the rules:

Creating a New Rule

In the previous section we were looking at the effect of existing rules. We were “validating” them to use the term by which the Locator UI refers to this activity. In this section however, we will be creating a new rule.

The first thing to do is to select the pipeline to which the new rule is to be added, the index pipeline or the search pipeline. In the first part of this tutorial we will be adding rules to the index pipeline. That is done by selecting the Index Rules List option.

The Indexing Pipeline

The click sequence below shows how to get to the index pipeline:

When we enter the index pipeline, by default it shows the Post-Engine version of the document for which we have provided the ID. That is, the document as it is after the rules have been executed. As we can see from the column preceding the field names at the bottom of the graphics above, each field is either prefixed by a green plus sign (which means that the field was added by some rule), by a yellow triangle (the field already existed and its value was updated by some rule), or by nothing (the field was not altered by any of the rules).

Below we see what each icon in the pipeline corresponds to within the Locator architecture. The three numbered documents represent the selected document at the various stages on its way through the index pipeline. Version 1 is the document just after it has been retrieved from the database. Number 2 is after it has been run through the Rules Engine (the Rule Engine is run from within the Index Builder), and number 3 shows what the document looks like in Solr after indexing.

Creating the “Hello World!” Rule

Rule Engine rules are written in XML. If you are not familiar with XML, watch this video first.

The “Hello World!” rule will look like this:

<rules> <rule name="Hello World!"> <actions> <replace field="document_text"> <set> <text>Hello World!</text> </set> </replace> </actions> </rule> </rules>

The rule replaces the content of field document_text (line 4) with the text string Hello World! (line 6).

To insert and test the rule, do the following.

  1. Click the green Rule Engine Editor icon

  2. Insert the XML from above into the editor

  3. Click the green test button below and verify that the resulting document field is as intended

After now having tested and verified that the rule works, save the rule like this:

  1. Name the rule Hello World! by replacing the “CUSTUM_RULE_<DATE>” string in the input field below the rule XML

  2. Click the blue Save New Rule button down to the right

For the rule we just made to take effect, the document has to pass through the index pipeline. That is, it has to be re-indexed. In this simple single document case, that is easiest done by just re-fetching the document as shown in the graphic below for using the same document id as we have used earlier to test the rule.

If however this was a rule that was to be applied to all documents in a repository, then it is better to use the repository tool described in Refetching and/or Reindexing Documents.

Once the document has been re-indexed and we search for the document, we will see that the Preview shows the same content as before. That is because the preview picks up its content from disk, and we did not change the original content. However, if we click the Raw text option, we do see that the indexed content has changed::

Also, if we now were to again use the validation option that we used earlier, we would also see the result of our rule there too:

Deleting a Rule

As we are not very well served by a rule that override all documents with the same content, we will delete the the Hello World! rule. That is done as shown here:

A Conditional Hello

There was a big problem with the Hello World! rule we just made. It would change every single fetched document with the result that the whole index will be full of documents with the very same content. Let’s create a new version of rule that we will call Conditional Hello World!. This version will have conditions that make sure that we only change the content if it is one of our test documents and not when it is any other document:

<rules> <rule name="Conditional Hello World!"> <conditions> <field name="config.connector_type@connector" value="fileserver"> <field name="document_text" value="This is a test document" /> <field name="document_text" pattern="^.*some testing.*$" /> </field> </conditions> <actions> <replace field="document_text"> <set> <text>Hello World!</text> </set> </replace> </actions> </rule> </rules>

Each field element within the conditions element above contains a single condition. These are:

  • The document must have been retrieved using the file server connector

  • The full document content must be exactly the string This is a test document, nothing less, nothing more

  • The regular expression states that the document must contain the string some testing somewhere within it

The question now is, are these AND or OR conditions? Well, let’s see. Notice how condition 2 and 3 are next to each other at the same XML nesting level. That makes the two being OR'ed. Then notice how both of them are within the element of the first condition. This makes condition 2 and condition 3 being AND'ed with condition 1. Hence, the complete XML above corresponds to this pseudo code:

Create or copy in a few documents in C:\test_data to be picked up by the connector and verify using the search frontend that only the targeted documents end up having its content changed. Use the procedure described in Refetching and/or Reindexing Documents if you at any time need to refetch or reindex the documents after having done modifications/corrections to the rule.

Creating a Searchable Field

So far we have been using the field document_text, both as the rule input data field and as the rule output data field. In this section we will continue to use the field document_text as an input field, but this time we will create a new index field to to hold the rule’s output data.

Dynamic Schema Fields

Locator comes with a large number of factory prepared search fields that are tailored to known and well established Locator use cases across the Ayfie customer base. In addition Locator also has what is referred to as dynamic search fields that are template type fields that can be used as the basis for creating new customized search field on the fly to address unique customer requirements. Below we have listed the two that will be used in this tutorial. For a more in-dept explanation in regard to dynamic fields, consult with Dynamic Fields Explained.

  • via_ti_: An Ayfie prepared field (via) of type text (t) that will be indexed (i). Text in this field can be searched word by word.

  • via_tsi_: The same type of field as the one above except that data in this field will also be stored (s) in the indexed and displayed with the search result.

  • via_sid_: An Ayfie prepared field (via) of type string (s) that will be indexed (i) and have docValues (d) which is good for refiners. The string stored in this field can only be searched in its entirety from end to end and not word by word.

We will be creating the fields via_ti_department, via_tsi_department and via_sid_department in later sections, but before that we will in the next section create some new test data to use with the fields.

Creating Some Test Data

Below we have some test data that we are to save as 4 files in directory C:\test_data. The files contains employee records and will be automatically retrieved by the file server connector that we earlier in this tutorial have already configured to retrieve data from the directory.

employee-0001.txt

employee-0002.txt

employee-0003.txt

employee-0004.txt

Creating a Textual Index Field

The first field we are to create will make the employee department searchable.

The field will be based on the dynamic field via_ti_ and named via_ti_department. As explained in Dynamic Fields Explained, the field will inherit the following features:

  • the data type is text (type=”text_general”)

  • the field will be indexed and thus searchable (indexed=”true”)

  • the field will not be part of the returned search result (stored=”false”)

  • the field will hold a single value, not an array (multiValue=”false”)

Created fields can now be found in the list of existing fields:

Populating the Field with Data

We will now use the rule below to extract the department name from the document_text field of each incoming document and place that name in the new field we created above:

Once the rule has been copied into the rule editor, test it and if valid, then name it and save it:

Notice we named the rule Department Text Extractor to differ it from the next rule that we will call Department String Extractor.

Use the procedure described in Refetching and/or Reindexing Documents to reindex the data for the rule to take effect for all the documents. Here is the simplest way for doing it:

Searching the Field

The graphic below shows a field specific search directed towards the new field we just created. Notice how we are able to successfully search single words within the field content (just “marketing” instead of “sales & marketing”). That will will not be possible for the string field type that we will be using next.

Creating a Refiner

In the screenshot above we see the two headlines Data Modified and Sources in the left pane. These referes to what we call refiners, filters or navigators. They are used to drill down into a sub set of the search results. In this section we will extend our search solution to also include a department refiner as shown in the next screenshot:

Creating a refiner consists of the following steps:

  • Create a string (not text) type index field

  • Create a rule that populates the field

  • Create a refiner for the index field

Creating a String Index Field

To see visualize the Create Index Field menu option we must again add the URL parameter se=1:

Add New Rule and Reindex

Below we have the same rule as we used for the text type except for these differences:

  • The rule is named Department String Extractor instead of Department Text Extractor

  • The extracted data is inserted into the field via_sid_department instead of via_ti_department

Reindex the documents to have the new rule take effect:

Creating a String Refiner

If one know do a wildcard search one should see the same result as shown in the first screenshot in this section. A Department refiner that shows that two of the test documents pertains to the Development Department and the other two to the Sales & Marketing Department:

Modifying a Refiner

Notice how the refiner we just created only allows one to filter on one department at the time. In a search result with more variation than what there is in our test data, being able to filter on more than one department at the time would probably be a very welcomed ability. How can we change it so that it becomes like that? Well, we could have configured it that way to start with by simply not using the default value of 1 as we did two screenshots up for (Multi-Select) Selection Limit, but instead set it to 1000 (any high value will do, but the convention is set it to 1000). However, that is too late now. We will thus have to change in the internal PostgreSQL or the external MSSQL database as we show for PostgreSQL in the next screenshot.

Instructions on how to log in to the database is found in section Installation Verification of the Ayfie Locator Installation Guide.

Below one can see the same refiner after the selection_limit field has been changed from 1 to 1000. Notice how the radio buttons have become checkboxes.

The selection_limit field is not the only field that can be upadeted in this way. Many of the other cells in the row can be updated as well.

Creating an Hierarchical Refiner

We will now make an hierarchical refiner like the one shown below based on the office location information in the retrieved data:

To achieve this we need do the following:

  • Create a navigation type index field

  • Create a rule that populates the index field with a value that has the '\' character as a delimiter between the levels

  • Create a FacetHierarchy type refiner for the index field with tags and exclude tags and with multi-select set to 1000 (by convention as any high number will do)

  • Manually update the refiner’s row in the refiner table in the database

Create a Navigation Type Index Field

Create the navigation type index field:

Create a Rule that Produces a FacetHierarchy Value

Create a rule that populates the index field with a value that has the '\' character as a delimiter between the levels:

Create a FacetHierarchy Type Refiner

Create the location refiner more or less the same way as we created the department refiner:

Update the Refiner in the Database

Unlike for the FacetField based department refiner for which we did not need to do anything in the database unless one has to correct something, the FacetHierarchy based location refiner requires it. Log in and access the refiner table the same way as we did in the previous section and then make sure that the row has the values shown here. The one thing in particular to do is to change the facet_type column from FacetField to FacetHierarchy:

And here we see the end result:

Create a Numeric Range Refiner

The employee data contains information about the employee’s skill level based on the employee’s training and past experience. The level is given as numeric score from 0 (clueless) to 9 (an undisputed expert). In this section we will learn how to make a range refiner for this information.

These are the ranges we will be using:

  • 0-1 Novice

  • 2-3 Advanced Beginner

  • 4-5 Competent

  • 6-7 Proficient

  • 8-9 Expert

There are two ways of creating a range refiner

  • Using a rule engine rule - the rule populates a new field via_str_skill_level with the values Novice, Advanced Beginner, Competent, Proficient and Expert based on the numeric skill level given in the document. One then creates a FacetField type refiner based on that new field the very same way as we did for the department refiner earlier in this tutorial. Hence, there is nothing new to learn with this approach.

  • Using a SOLR query - this is done using a FacetQuery type refiner and what we will be learning in this section.

Here are the steps for creating our FacetQuery type range refiner:

  • Create a numeric index field

  • Create a rule that populates the index field

  • Create a FacetQuery refiner for the index field

  • Insert one row per skill level with the SOLR range query

Create a Numeric Index Field

The first thing to do is to create a numeric index field. Here we create the field via_ipsid_skill_level that will hold an integer value.

The index field we created above has the prefix via_ipsid_. As one can read about in Dynamic Fields Explained, the s in via_ipsid_ means that value will be returned with the search result as it is not only indexed (the i), but also stored in the indexed document. This is not necessary for the refiner to work. So for the sake of the refiner we could instead have used the dynamic template via_ipid_ (without the s, otherwise the same). However, later in this tutorial we will show how we can add fields to the search result. Us here using via_ipsid_ instead of via_ipid_ is the first step to achieving that.

Create a rule that populates the index field

We will now use the rule below to extract the employee’s skill level from the document_text field of each incoming document and place that score in the new field we created above:

Create a FacetQuery Refiner

Log into the database and look up the config.refiner table and change the facet_type column for the refiner we just created from FacetField to FacetQuery and then save the change.

Take note of the id of the skill level refiner as we will need it for the SQL statments that we are about to run next. In the table above, we see that the refiner_id is 21 (yours will could be another value).

Insert Solr Range Queries

Here are the SQL INSERT statements to be run for uploading the 5 skill level ranges that we defined at the beginning of this section. These were 0-1, 2-3, 4-5, 6-7 and 8-9: Replace the 5 occurrences of XXXX below with the refiner_id of the skill level refiner. One could also change the creation and modified date, but that is not critical in any way.

Below we see how to execute the SQL statements from above:

And here we see the end product:

Changing the Order of the Refiners

Refiners are listed in the left pane in the order they are created. We see we have the out of the box refiners first and then we have the Department refiner, the Office Location Refiner and finally the Skill Level refiner that was the last one to be created. But what if I want the skill level refiner to come before the two other ones that we created. For that we change the sort order:

And below we see the result of giving the skill level refiner sort order 15 and then bumping up the department and the office location refiners' sort order by 1 each. The skill level refiner has moved from the end to in the middle of the pack:

Removing Refiners

The way to remove a refiner is not to delete it, but to hide it by disabling it. That is done by changing column enabled in table config.refiner from true to false for the refiner that is no longer to be displayed. In the screenshot below we are disabling the skill level refiner:

And this is what the list of refiners in the search frontend looks like after that change. Notice how there is no longer any skill level refiner:

Multi Valued Index Field

So far all the key-value pairs that we have been dealing with have been single valued. In this section we will make a refiner that requires a value that is an array. Or to say it with Solr terminology, it needs a fied that is multivalue.

So when does a refiner need a multivalue field? All the refiners we have made so far have only had one possible value per employee and thus per document. An employee could in our case only be working for one department at one location and have one general skill level. In this section we will make a refiner called Job Titles that for each document will have a value for each position from the very top of the company and all the way down to and including the title of the employee. Such a refiner requires the following steps:

  • Create a multivalue string type index field

  • Create a rule that populates the index field with an array listing all the job titles

  • Create a FacetField type refiner that uses the array in the index field as its input data

Create a Multivalue String Type Index Field

By consulting with Dynamic Fields Explained we see that we need a dynamic field with the prefix via_simd_ or via_ssimd_ (both are multivalued). Since we just need the data to be indexed so that it can be used by the refiner and we do not need it to be part of the search result itself, we can use via_simd_ (that has no second s for storage) even though both will work.

Go ahead and create the index field via_simd_job_titles the same way we have done it earlier in this tutorial.

Create a Rule that Populates the Index Field with an Array

The rule below have these 3 actions:

  • Extract the list of job titles from the document text and place it in the field via_simd_folders

  • Reduce the two character (comma and space) delimiter string into a single character delimiter (here we use the bar character ('|'), but any unique character will do, even the comma we had earlier)

  • Split the string into an array using the bar character as the delimiter

Create a Refiner that Uses the Array as Input Data

Create an refiner the same way as earlier in this tutorial that has the array in field via_simd_job_titles as the input data.

And here is what the end result will look like:

Changing the Search Result

In the beginning of this tutorial we described how the Rule Engine has two sides, the index side and the the search side. Everything that we have done up until this point have been on the index side. That is, any rule that we have made has done something to the content of documents after they have been retrieved from the database and before they have been indexed. Now we are going to implement rules on the search side. That is, these rules will do something with the document after they have been retrieved from the index, but before they are passed on to the user as search results.

“More Details” and “Primary Details”

Below we see one of our test documents in a search result. The data within the small rectangle to the left shows what is called the Primary Details and the data within the large rectangle to the right is referred to as (More) Details.

In this section we will learn how to add additional fields to Details and/or Primary Details.

Alternative Way to Access to the Rule Engine

At the beginning of this tutorial we learned how to get to the Rule Engine by enabling the Rules menu option by adding the se=1 URL parameter. Now we are going to learn another route to the same destination. For that we first need to obtain a document ID using the debug=true URL parameter at the search page, and then use the retrieved document ID to look up the document in the Locator Dashboard. Once the document is up one then click the green Test button to get to the rule engine as shown here:

As we can see, the click sequence takes us directly to the search side. The easiest way of getting to the index side from here is to replace Search with Index in the URL as shown here below:

Refinding the Index Data on the Search Side

Once inside the search side document pipeline we enter the rule engine editor to have a look at the document before any search side rules are executed. In the list of key-value pairs there will be some that starts with SDKHit.Metadata@. Among them one will find any dynamic fields with an s for storage (check Dynamic Fields Explained for a reminder on index field naming). A little bit earlier in this tutorial we purposly used the prefix via_ipsid_ instead of via_ipid_. If we had used the latter, we would still have been able to create the skill level refiner, but the skill level would not have been among the search side document key-value pairs. As we can see here below, due to the s in via_ipsid_skill_level the field made it intp the document that is retreived from the index:

From SDKHit.MetaData to SDKHit.MoreDetails and SDKHit.PrimaryDetails

But as we saw above, even though the skill level appear in the document, it does not appear in neither the Primary Details nor under the Details tab in the Preview popup window. And that is something we are going to do something about now.

The rule below will copy the key-value pair SDKHit.Metadata@via_ipsid_skill_level to SDKHit.MoreDetails@SkillLevel and to SDKHit.PrimaryDetails@SkillLevel. so that the data will be shown in the two locations in the search result.

After first havig tested the rule, save it with the name Custom More Details and Primary Details:

As seen below, the skill level will now appear in the in the primary details (top red circle) and within (more) details (bottom red circle).