Modifying free-text search for additional fields

You can modify the configuration of free-text search in many ways. For example, you can add or remove fields for search criteria, modify how fields are stored in the Guidewire Solr Extension, and configure how fields are matched to search criteria. For complete information on how to modify the Guidewire Solr extension, consult the online documentation for Apache Solr 6.6.

This section shows by example the configuration files that you typically modify to change how the Guidewire Solr Extension loads, indexes, and searches data. The example is a simple configuration change to add a field to free-text search.

Important: Adding multi-valued fields can affect free-text search performance. In particular, adding fields with too many values can significantly degrade full-text search performance.

Configuration files for full-text loading, indexing, and searching

Typically, you modify the multiple files to configure the way the Guidewire Solr extension loads, indexes, and searches information in the Guidewire Solr Extension. Specific files configure each component of the extension.

The following table lists the configuration files for PolicyCenter:

Configuration file

Description

policy-search-config.xml

Defines the mapping between PolicyCenter fields and Guidewire Solr Extension, and configures how the Guidewire Solr Extension indexes and searches its index documents.

PolicyCenter/modules/configuration/config/search/policy‑search‑config.xml

PCSolrSearchPlugin.gs

The implementation class for the free-text plugin ISolrSearchPlugin. This plugin sends search criteria to the Guidewire Solr Extension and receives the search results.

PolicyCenter/modules/pc/gsrc/gw/solr/PCSolrSearchPlugin.gs

PCSolrMessageTransportPlugin

The implementation of ISolrMessageTransportPlugin. This plugin sends update messages to Guidewire Solr Extension when entities in the Guidewire Solr Extension index undergo changes in the PolicyCenter database.

EventFired.grs (IndexingSystem rule)

Rules file containing a top level rule that defines which entities and which events on those entities are of interest to Guidewire Solr Extension. This is the point of creation for the messages that the associated ISolrMessageTransportPlugin instance sends.

The following table lists the configuration files for Guidewire Solr Extension.

Configuration file

Description

More information

schema.xml

Defines the document schema structure for the Guidewire Solr Extension. Provides general configuration for how Guidewire Solr Extension uses a document, the index definition for a document, and the analyzers available for index assignment or usage.

http://wiki.apache.org/solr/SchemaXml

https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-6.6.pdf

http://lucene.apache.org/solr/guide/6_6

/opt/gwsolr/pc/solr/policy_active/conf/schema.xml

data-config.xml

Defines where to locate and how to interpret data from the batch load command. Set to read-only mode at start-up.

http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1

/opt/gwsolr/pc/solr/policy_active/conf/data‑config.xml

The following table lists the configuration files for the free-text batch load command.

Configuration file

Description

batchload.bat | batchload.sh

Represents the batch load command.

/opt/gwsolr/pc/solr/policy_active/conf/batchload.sh

batchload-config-databaseBrand.xml

Contains database connection information and brand-specific native SQL for selecting data from the PolicyCenter relational database.

/opt/gwsolr/pc/solr/policy_active/conf/batchload-config-oracle.xml

postprocess.bat |

postprocess.sh

Collates and compiles index documents for the Guidewire Solr Extension using data selected from the relational database.

/opt/gwsolr/pc/solr/policy_active/conf/postprocess.sh

Externalized server configuration

Guidewire supports the use of substitution variables in Solr configuration file solrserver-config.xml only. For more information see the following:

Sequence of steps for adding a field to free-text search

Perform these high-level steps to configure free-text search with an additional field. The examples in the following topics add a policy postal code as a free-text field.

Important: The PolicyCenter base configuration does not support free-text search of entity types other than policies and submissions. The addition of such free-text search fields is a complex endeavor. When engaging in all such endeavors, proceed with competence and care. Guidewire will support customers in their development of additional search types.

Defining a new free-text field in the Guidewire Solr Extension

Use the following file to define a new free-text field in the Guidewire Solr Extension.

/opt/gwsolr/pc/solr/policy_active/conf/schema.xml

The Guidewire Solr Extension defines the format of this file.

The schema configuration file contains <field> elements, one for each full-text field of the index documents in the Guidewire Solr Extension.

The following example defines a full-text field that stores postal codes.

<field name="postalCode" type="gw_unanalyzed" indexed="true" 
        stored="true" required="false"
        multiValued="false"/>

The definition directs the Guidewire Solr Extension to index the values of postal code fields, so search criteria can include postal codes. The definition directs the Guidewire Solr Extension to store the values of postal code fields, so items returned in search results include them.

The type attribute in the preceding definition identifies the analyzer with which Guidewire Solr Extension processes postal code fields. The analyzer type indicates the category of matches that are possible on applicable search fields. In this case, a value for the type attribute of gw_unanalyzed indicates that postal codes are raw text fields and that Guidewire Solr Extension does not analyze them. Given this value, only exact matches are possible for searches on the values of postal code fields.

Defining a new free-text field in PolicyCenter

Use the following file to define a new free-text field in PolicyCenter.

PolicyCenter/modules/configuration/config/search/policy-search-config.xml

Access the file in the Project window in Studio by navigating to configuration > config > search policy-search-config.xml. PolicyCenter defines the format of this file.

Free-text search configuration files have these main elements:

  • <Indexer> – Contains <IndexField> elements to define field names and their locations within object graphs of the root object and in the index documents sent to the Guidewire Solr Extension.
  • <Query> – Contains the following types of elements:
    • <QueryTerm> elements define how specific fields are matched and how a match contributes to the overall score. A query term is one of two types: term and subquery. A term type searches a single index for a single field. A subquery type searches multiple indices for multiple fields simultaneously and scores the most appropriate match.
      Important: Do not confuse the term type query term with the subquery type query term. A subquery type can search multiple indexes for multiple fields. A term type can only search one index for a single field. An XSD only validates the structure of a query. The XSD does not validate the inputs for a term type. Thus, if you attempt to enter multiple terms in a term type, the XSD will not catch this error. Instead, the platform will throw a run-time exception.
    • <FilterTerm> elements restrict Guidewire Solr Extension from returning a result if a field in that result matches given search criteria. The elements do not affect the score of any results. Instead, they help users to limit the results that a search returns, such as by a date range. The elements can also facilitate security by controlling the results that users receive.
  • <QueryResult> – Contains <ResultProperty> elements to configure whether and how specific fields are returned in query results from the Guidewire Solr Extension.

To add a free-text field to policy-search-config.xml, first add an <IndexField> element to the <Indexer> element. The following example defines a free-text field for postal codes.

<IndexField field="postalCode">
  <DataProperty path="root.PolicyAddress.PostalCode"/>
</IndexField>

The definition specifies that values of postalCode fields in the index documents sent to the full-text engine come from addresses on policy periods, the root object.

Next, add <FilterTerm> elements for the new free-text field to the <Query> element. The following example specifies how the Guidewire Solr Extension matches PostalCodeCriteria values in search criteria with postalCode values in the Guidewire Solr Extension.

<FilterTerm>
  <DataProperty path="root.PostalCodeCriteria"/>
  <QueryField field="postalCode"/>
</FilterTerm>

The definition directs the Guidewire Solr Extension to accept postal codes in search criteria and where to find them in the XML structure of the search criteria.

Finally, add a <ResultProperty> element to the <QueryResult> element. The following example defines how the Guidewire Solr Extension returns postalCode values in query results.

<ResultProperty name="PostalCode">
  <ResultField name="postalCode"/>
</ResultProperty>

The definition assigns the postalCode value in the result to the PostalCode on the result object.

Defining a new free-text field in the batch load command

To load data from an existing PolicyCenter database, the new free-text field must be listed in the batchload-config-database brand.xml files and the data-config.xml file. You decide whether to include or exclude the field from the digest that prevents duplicate index entries.

Generating the SQL query

Add the new free-text field in the SELECT portion of the query that corresponds to the data. The SQL for postal codes looks like following sample SELECT statement.

SELECT DISTINCT
  ...
  addr.postalCode
  ...
  FROM pc_policyperiod AS pp
    ...
    INNER JOIN pc_policyaddress paddr
      ON paddr.BranchID = pp.ID
    ...
    INNER JOIN pc_address addr
      ON addr.ID = paddr.Address
    ...
  WHERE 
    ...
    AND (paddr.EffectiveDate  IS NULL OR paddr.EffectiveDate  &lt;= pp2.EditEffectiveDate )
    AND (paddr.ExpirationDate IS NULL OR paddr.ExpirationDate &gt;  pp2.EditEffectiveDate )
    ...

You duplicate this change in batchload-config-oracle.xml, batchload-config-sqlserver.xml, and batchload-config-h2.xml. Each database brand uses a separate configuration file because the syntax of the SQL Select statement for each database is slightly different.

Note: Because the SQL is included in an XML file, you must escape the less than (<) symbol as “&lt;” and the greater than (>) symbol as “&gt;”. Do not include the quotation marks in the escape sequence.

The query is processed into an XML document that will be loaded into SQL. Batch loading of XML into Solr is described in http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1.

The processed field names are all in upper case, and the structure of the XML document is:

<CONTAINER_ELEM>
  <POLICY>
    <!-- data for one policy -->
  </POLICY>
</CONTAINER_ELEM>

To include the postalCode in the index, put an entry in data-config.xml that looks like the following XML code:

<field column="postalCode" xpath="/CONTAINER_ELEM/POLICY/POSTALCODE"/>

Within the row for one policy, the fields will be in the same order as they were returned by the SELECT statement.

You can have fields in the SQL result that do not become part of the XML of the index documents. The Guidewire Solr Extension ignores such fields when it loads the SQL result. These fields are not part of the index document schema. The batch load command uses these kinds of fields to sort and manipulate the data returned from the database to XML to produce the final XML index documents to load.

Computing the hash value to eliminate duplicates

To avoid a chain of policy changes or renewals generating identical, duplicate index entries, the Guidewire Solr Extension makes an SHA-1 hash value of most of the index data. This hash value is included in the URN (Unique Record Name) of the index entry. Entries on the same policy and with the same SHA hash value are collapsed into a single entry.

First, you decide whether the field is part of the hash value. If so, you must modify the file PCSolrSearchPlugin.gs to include that data in the digest when generating a new index entry. The free-text batch load command includes fields by default.

To include a new field in the hash value requires one line of Gosu code:

sb.append("postalCode", period.PolicyAddress.PostalCode)

If this field is not part of the hash value, you must make the batch load command ignore the field for digest purposes. An example of such a field is sliceDate. The configuration for the digester is near the bottom of the batchload-config-database brand.xml files. The sliceDate column is among the columns to ignore.

<transformer
  name="digestTransformer"
  class="com.guidewire.solr.batchload.xform.PCDigestTransformer"
  ignoreElems="urn, periodID, policyPublicID, sliceDate, periodStart, periodEnd, policyStart,
          policyEnd, periodIdWithSliceDate, jobType"
  algorithm="SHA"
/>

For default columns, include this field in PCSolrSearchPlugin.gs as well.

static function initXformer() : com.guidewire.solr.batchload.xform.DigestTransformer {
  try {
    var xml =
      "<transformer name=\"digestTransformer\"
         class=\"com.guidewire.solr.batchload.xform.PCDigestTransformer\""
                + " algorithm=\"SHA\""
                + " ignoreElems=\"urn, periodID, policyPublicID, sliceDate, periodStart, periodEnd, ""
                + " policyStart, policyEnd, periodIdWithSliceDate, jobType\"/>";
    var xf = new com.guidewire.solr.batchload.xform.PCDigestTransformer(false)
    var doc = com.guidewire.solr.batchload.Utils.parseXml(xml)
    xf.configure(doc.getDocumentElement())
  }
}

The final stage to eliminating duplication is in the postprocess script. The postprocess script sorts the data by a combination of URN, slice date, and term number. The script then removes rows with duplicate urn values. Slice date and term number are used in the initial sort to retain the latest version of the duplicate data, which is the same way that PolicyCenter indexes the data.