Modifying free-text search for additional fields
You can modify the configuration of free-text search in many ways. For example, you can add or remove fields for search criteria, modify how fields are stored in the Guidewire Solr Extension, and configure how fields are matched to search criteria. For complete information on how to modify the Guidewire Solr extension, consult the online documentation for Apache Solr 6.6.
This section shows by example the configuration files that you typically modify to change how the Guidewire Solr Extension loads, indexes, and searches data. The example is a simple configuration change to add a field to free-text search.
Configuration files for full-text loading, indexing, and searching
Typically, you modify the multiple files to configure the way the Guidewire Solr extension loads, indexes, and searches information in the Guidewire Solr Extension. Specific files configure each component of the extension.
The following table lists the configuration files for PolicyCenter:
Configuration file |
Description |
|---|---|
policy-search-config.xml |
Defines the mapping between PolicyCenter fields and Guidewire Solr Extension, and configures how the Guidewire Solr Extension indexes and searches its index documents. |
PolicyCenter/modules/configuration/config/search/policy‑search‑config.xml |
|
PCSolrSearchPlugin.gs |
The implementation class for the free-text plugin ISolrSearchPlugin. This plugin sends search criteria to the Guidewire Solr Extension and receives the search results. |
PolicyCenter/modules/pc/gsrc/gw/solr/PCSolrSearchPlugin.gs |
|
|
PCSolrMessageTransportPlugin | The implementation of ISolrMessageTransportPlugin. This plugin sends update messages to Guidewire Solr Extension when entities in the Guidewire Solr Extension index undergo changes in the PolicyCenter database. |
EventFired.grs (IndexingSystem rule) | Rules file containing a top level rule that defines which entities and which events on those entities are of interest to Guidewire Solr Extension. This is the point of creation for the messages that the associated ISolrMessageTransportPlugin instance sends. |
The following table lists the configuration files for Guidewire Solr Extension.
Configuration file |
Description |
More information |
|---|---|---|
schema.xml |
Defines the document schema structure for the Guidewire Solr Extension. Provides general configuration for how Guidewire Solr Extension uses a document, the index definition for a document, and the analyzers available for index assignment or usage. |
http://wiki.apache.org/solr/SchemaXml https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-6.6.pdf |
/opt/gwsolr/pc/solr/policy_active/conf/schema.xml |
||
data-config.xml |
Defines where to locate and how to interpret data from the batch load command. Set to read-only mode at start-up. |
http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 |
/opt/gwsolr/pc/solr/policy_active/conf/data‑config.xml |
||
The following table lists the configuration files for the free-text batch load command.
Configuration file |
Description |
|---|---|
batchload.bat | batchload.sh |
Represents the batch load command. |
/opt/gwsolr/pc/solr/policy_active/conf/batchload.sh |
|
batchload-config-databaseBrand.xml |
Contains database connection information and brand-specific native SQL for selecting data from the PolicyCenter relational database. |
/opt/gwsolr/pc/solr/policy_active/conf/batchload-config-oracle.xml |
|
postprocess.bat | postprocess.sh |
Collates and compiles index documents for the Guidewire Solr Extension using data selected from the relational database. |
/opt/gwsolr/pc/solr/policy_active/conf/postprocess.sh |
Externalized server configuration
Sequence of steps for adding a field to free-text search
Perform these high-level steps to configure free-text search with an additional field. The examples in the following topics add a policy postal code as a free-text field.
Defining a new free-text field in the Guidewire Solr Extension
Use the following file to define a new free-text field in the Guidewire Solr Extension.
/opt/gwsolr/pc/solr/policy_active/conf/schema.xmlThe Guidewire Solr Extension defines the format of this file.
The schema configuration
file contains <field>
elements, one for each full-text field of the index documents in the
Guidewire Solr Extension.
The following example defines a full-text field that stores postal codes.
<field name="postalCode" type="gw_unanalyzed" indexed="true"
stored="true" required="false"
multiValued="false"/>
The definition directs the Guidewire Solr Extension to index the values of postal code fields, so search criteria can include postal codes. The definition directs the Guidewire Solr Extension to store the values of postal code fields, so items returned in search results include them.
The type attribute in the preceding definition identifies the analyzer with which Guidewire Solr Extension processes postal code fields. The analyzer type indicates the category of matches that are possible on applicable search fields. In this case, a value for the type attribute of gw_unanalyzed indicates that postal codes are raw text fields and that Guidewire Solr Extension does not analyze them. Given this value, only exact matches are possible for searches on the values of postal code fields.
Defining a new free-text field in PolicyCenter
Use the following file to define a new free-text field in PolicyCenter.
PolicyCenter/modules/configuration/config/search/policy-search-config.xml
Access the file in the Project window in Studio by navigating to policy-search-config.xml. PolicyCenter defines the format of this file.
Free-text search configuration files have these main elements:
<Indexer>– Contains<IndexField>elements to define field names and their locations within object graphs of the root object and in the index documents sent to the Guidewire Solr Extension.<Query>– Contains the following types of elements:<QueryTerm>elements define how specific fields are matched and how a match contributes to the overall score. A query term is one of two types: term and subquery. A term type searches a single index for a single field. A subquery type searches multiple indices for multiple fields simultaneously and scores the most appropriate match.Important: Do not confuse the term type query term with the subquery type query term. A subquery type can search multiple indexes for multiple fields. A term type can only search one index for a single field. An XSD only validates the structure of a query. The XSD does not validate the inputs for a term type. Thus, if you attempt to enter multiple terms in a term type, the XSD will not catch this error. Instead, the platform will throw a run-time exception.<FilterTerm>elements restrict Guidewire Solr Extension from returning a result if a field in that result matches given search criteria. The elements do not affect the score of any results. Instead, they help users to limit the results that a search returns, such as by a date range. The elements can also facilitate security by controlling the results that users receive.
<QueryResult>– Contains<ResultProperty>elements to configure whether and how specific fields are returned in query results from the Guidewire Solr Extension.
To add a free-text field
to policy-search-config.xml,
first add an <IndexField>
element to the <Indexer>
element. The following example defines a free-text field for postal codes.
<IndexField field="postalCode">
<DataProperty path="root.PolicyAddress.PostalCode"/>
</IndexField>
The definition specifies that values
of postalCode fields in
the index documents sent to the full-text engine come from addresses
on policy periods, the root object.
Next, add <FilterTerm> elements for
the new free-text field to the <Query>
element. The following example specifies how the Guidewire Solr Extension
matches PostalCodeCriteria
values in search criteria with postalCode
values in the Guidewire Solr Extension.
<FilterTerm>
<DataProperty path="root.PostalCodeCriteria"/>
<QueryField field="postalCode"/>
</FilterTerm>
The definition directs the Guidewire Solr Extension to accept postal codes in search criteria and where to find them in the XML structure of the search criteria.
Finally, add a <ResultProperty> element
to the <QueryResult>
element. The following example defines how the Guidewire Solr Extension
returns postalCode values
in query results.
<ResultProperty name="PostalCode">
<ResultField name="postalCode"/>
</ResultProperty>
The definition assigns the postalCode value in the result
to the PostalCode on the
result object.
Defining a new free-text field in the batch load command
To load data from an existing PolicyCenter database, the new free-text field must be listed in the batchload-config-database brand.xml files and the data-config.xml file. You decide whether to include or exclude the field from the digest that prevents duplicate index entries.
Generating the SQL query
Add the new free-text field in the SELECT portion of the query that corresponds to the data. The SQL for postal codes looks like following sample SELECT statement.
SELECT DISTINCT
...
addr.postalCode
...
FROM pc_policyperiod AS pp
...
INNER JOIN pc_policyaddress paddr
ON paddr.BranchID = pp.ID
...
INNER JOIN pc_address addr
ON addr.ID = paddr.Address
...
WHERE
...
AND (paddr.EffectiveDate IS NULL OR paddr.EffectiveDate <= pp2.EditEffectiveDate )
AND (paddr.ExpirationDate IS NULL OR paddr.ExpirationDate > pp2.EditEffectiveDate )
...
You duplicate this change in batchload-config-oracle.xml, batchload-config-sqlserver.xml, and batchload-config-h2.xml. Each database brand uses a separate configuration file because the syntax of the SQL Select statement for each database is slightly different.
<) symbol as “<” and the greater than
(>) symbol as “>”. Do not include the
quotation marks in the escape sequence.The query is processed
into an XML document that will be loaded into SQL. Batch loading of XML
into Solr is described in http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1.
The processed field names are all in upper case, and the structure of the XML document is:
<CONTAINER_ELEM>
<POLICY>
<!-- data for one policy -->
</POLICY>
</CONTAINER_ELEM>
To include the postalCode in the index, put an
entry in data-config.xml
that looks like the following XML code:
<field column="postalCode" xpath="/CONTAINER_ELEM/POLICY/POSTALCODE"/>
Within the row for one policy, the fields will be in the same order as they were returned by the SELECT statement.
You can have fields in the SQL result that do not become part of the XML of the index documents. The Guidewire Solr Extension ignores such fields when it loads the SQL result. These fields are not part of the index document schema. The batch load command uses these kinds of fields to sort and manipulate the data returned from the database to XML to produce the final XML index documents to load.
Computing the hash value to eliminate duplicates
To avoid a chain of policy changes or renewals generating identical, duplicate index entries, the Guidewire Solr Extension makes an SHA-1 hash value of most of the index data. This hash value is included in the URN (Unique Record Name) of the index entry. Entries on the same policy and with the same SHA hash value are collapsed into a single entry.
First, you decide whether the field is part of the hash value. If so, you must modify the file PCSolrSearchPlugin.gs to include that data in the digest when generating a new index entry. The free-text batch load command includes fields by default.
To include a new field in the hash value requires one line of Gosu code:
sb.append("postalCode", period.PolicyAddress.PostalCode)
If this field is not part of the hash value, you must make the batch load command ignore the field
for digest purposes. An example of such a field is sliceDate. The configuration for the
digester is near the bottom of the batchload-config-database brand.xml
files. The sliceDate column is among the columns to ignore.
<transformer
name="digestTransformer"
class="com.guidewire.solr.batchload.xform.PCDigestTransformer"
ignoreElems="urn, periodID, policyPublicID, sliceDate, periodStart, periodEnd, policyStart,
policyEnd, periodIdWithSliceDate, jobType"
algorithm="SHA"
/>
For default columns, include this field in PCSolrSearchPlugin.gs as well.
static function initXformer() : com.guidewire.solr.batchload.xform.DigestTransformer {
try {
var xml =
"<transformer name=\"digestTransformer\"
class=\"com.guidewire.solr.batchload.xform.PCDigestTransformer\""
+ " algorithm=\"SHA\""
+ " ignoreElems=\"urn, periodID, policyPublicID, sliceDate, periodStart, periodEnd, ""
+ " policyStart, policyEnd, periodIdWithSliceDate, jobType\"/>";
var xf = new com.guidewire.solr.batchload.xform.PCDigestTransformer(false)
var doc = com.guidewire.solr.batchload.Utils.parseXml(xml)
xf.configure(doc.getDocumentElement())
}
}
The final stage to eliminating duplication is in the
postprocess script. The postprocess script sorts the
data by a combination of URN, slice date, and term number. The script then removes rows with
duplicate urn values. Slice date and term number are used in the initial sort to retain the
latest version of the duplicate data, which is the same way that PolicyCenter indexes the data.
