Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

From time to time, it is desired to add custom fields to Locator that can be used in reports generated by Supervisor/Insight. There are a lot of pre-defined fields that come out of the box with Locator and Supervisor/Insight, but often customers have specific data that they need to be able to report on that is not covered by the extraction rules provided. By following this guide, we will give some a real life examples example from a customer request that should give you the means to configure this yourself.

...

You should have some experience with the rules engine, and how to access this and test your rules. You should also have an understanding of how regular expressions work and how to create these. We find this online regular expressions test tool very valuable to test regular expressions. You should also know how to make inserts into the database, and know how to use the command line tool Via.Repository.exe which is used to REINDEX the data from the database going into SOLR. You also need to make changes to the SOLR schema_overrides.xml file to add the new field(s) to SOLR, and how to apply these changes.

Info
titleWhat is a Lingo field?

We use SOLR to index data, and SOLR requires that all the data are stored in defined fields. These fields can contain either normal text, date and time data, geographical positions or a whole range of other data types/content. For instance, the actual text of an indexed document is stored in a field called document_text.

When Supervisor/Insight is installed, we run the content of the document text through a wide range of linguistic text filters and rules to extract information that pertains to personally identifiable information or text that matches other identifiable information. This ranges from names, addresses, social security numbers, bank account numbers, city names etc.

A Lingo field is then basically a storage location in SOLR for a specific set of data we want to identify and report on.

...

In this real life example, our customer wanted to have four custom Lingo fields added to their Locator /Supervisor setup.

  • Italian IBAN numbers
  • Italian drivers license numbers
  • Italian identification ID numbers
  • Italian tax identification numbers

Below is a list of the fields we needed to create, and some sample text along with the regular expressions we needed use to extract the data we need.

...

Info
titleImportant notice

All paths within this document point to a default installation of Locator, using the following paths

  • Program Files → C:\Program Files\VirtualWorks\ViaWorks\
  • ProgramData → C:\ProgramData\VirtualWorks\ViaWorks\

If you have chosen different paths for your installation of Locator, you need to adjust the paths used in the commands below.

First of, we need to make the SOLR Index Service aware of the fields we require. Supervisor will report on any SOLR field that is prefixed with lingo_, which is why all our custom fields follow this naming pattern. To add these fields to our SOLR configuration, we need to edit the file solr_overrides.xml located in %ProgramData%\VirtualWorks\ViaWorks\Solr\configsets\ViaWorksCloud\conf (or %ProgramData%\Konica Minolta\dokoni FIND\Solr\configsets\ViaWorksCloud\conf for dokoni FIND).

Open this file with your favourite text editor, and add the following content inside the <diff> </diff> XML code.

Code Block
languageXML
themeemacs
  <add sel="/schema/fields">
    <field name="lingo_kmit_iban" type="string" indexed="true" stored="false" multiValued="true" docValues="true" />
	<field name="lingo_kmit_driverlic" type="string" indexed="true" stored="false" multiValued="true" docValues="true" />
	<field name="lingo_kmit_idcard" type="string" indexed="true" stored="false" multiValued="true" docValues="true" />
	<field name="lingo_kmit_taxcode" type="string" indexed="true" stored="false" multiValued="true" docValues="true" />
  </add>

...

Now that we have our overrides file in place, we have to apply these overrides to the schema.xml file. This is done by using the command line tool Via.SolrUpdate.exe. Open up a CMD session with administrative privileges and issue the following command:

...

If everything went smoothly, a new schema.xml file should now be ready with the required SOLR fields. To enable the new configuration, we have to upload the changes to SOLR using ZooKeeper. Again, using the already open CMD session, issue the following command:

...

At this point, SOLR is now running with our new configuration and is aware of the new Lingo fields. Now we move over to the next step, which is to enable the fields in the database.

...

We are now ready to make Locator aware that these fields should be indexed, and to achieve this, we need to make the framework do this. This is done by adding the index fields to the index.index_field table in the database. Start the Postgres Admin tool located in %Program Files%\VirtualWorks\ViaWorks\Postgres\bin\pg3admin.exe.

...

If you've added the content of the file which we have included at the bottom of this page under Addenum, we can easily see if the rule works or not - and I will use this in our example below. To test the rule, press the Test -> button. This will now show you the Post-Engine Document, in other words how the document will be stored in the SOLR Index. If we scroll down on the page until we find our lingo_ fields, we should see the following:

...

This shows that the rule works as intended, and the text is extracted and added to our lingo_ fields and fields and the rule is now ready to be saved. To do this, scroll up to the top of the current page and press the </> Temporary Rule button. This brings you back to the Rules Engine editor. We now have to enter a name for our rule, and in our example we have chosen to name it index_kmit_custom_insight_fields. The reason for this naming scheme is both to give an indication that this is an index rule, and also provide a textual high level explanation what the rule does. Once you have given the rule a name, you can press the Save New Rule button.

...

At this point, you should be able to generate a report on these fields using Supervisor/Insight. Log into Supervisor/Insight , and create a new report. In the report wizard under Please select required fields, you should now be able to see the new fields per our example below.

...

You might be wondering what goes on behind the scenes in the above rule, so I'm going to explain one of them, namely the lingo_kmit_taxcode rule. First lets look at the rule.

...

  1. First we copy the content of the document found in the field document_text to a temporary object which we call temp, this so that we do not change the document content.
  2. We then use the Rules Engine action called explodematches, which searches our temporary object temp for the text that matches our regular expression.
  3. Our regular expression is as follows: ([Cc][Oo][Dd][Ii][Cc][Ee] [Ff][Ii][Ss][Cc][Aa][Ll][Ee]|[Tt][Aa][Xx] [Cc][Oo][Dd][Ee]) [A-Z]{6}\d{2}[A-Z]\d{2}[A-Z]\d{3}[A-Z]
  4. The matches in our temporary object temp is then written to a list - if there is more than one match, this will result in a multiple value list.
  5. Seeing as our regular expression is of the greedy sort, we also end up with the text before the actual tax code - this is not something we want, so we need to remove this.
  6. We then copy the content from our list in the object temp to a new object called lingo_kmit_taxcode, where we use another Rules Engine action called replace.
  7. The replace action is instructed to look for text matching our regular expression ([Cc][Oo][Dd][Ii][Cc][Ee] [Ff][Ii][Ss][Cc][Aa][Ll][Ee]|[Tt][Aa][Xx] [Cc][Oo][Dd][Ee]) - if this text is found, we simply remove it.
  8. The rule should now have made the list of all matches, removed unwanted text, and leave us with a list of the tax codes.

...