Configuring the Guidewire Solr Extension for high availability

You configure the Guidewire Solr Extension for high availability by deploying it to multiple SolrCloud servers, managed as a cluster, or ensemble, by Apache ZooKeeper.

You can download and install ZooKeeper on the hosts where you want to install and run the Guidewire Solr Extension for high availability. Alternatively, you can run ZooKeeper on its own host apart from the Guidewire Solr Extension. Note that the distribution of PolicyCenter does not include the ZooKeeper software.

Guidewire suggests limiting the number of ZooKeeper hosts to five.

Configuration for high availability requires you to modify the following files:
solrserver-config.xml
PolicyCenter file that configures PolicyCenter to connect with SolrCloud servers in a ZooKeeper ensemble
zoo.cfg
ZooKeeper file that configures each SolrCloud server for membership in a ZooKeeper ensemble
myID
ZooKeeper file that configures each SolrCloud server with its ordinal number in the ensemble

Your PolicyCenter instance connects directly with individual member servers in a ZooKeeper ensemble. A server definition for high availability has the following syntax:

<solrserver name="cloud" type="cloud">
  <param name="zkhosts" value="zookeeperHost[:port][,zookeeperHost[:port]][,...]/chRoot"/>
</solrserver>

The default port for a ZooKeeper host is to 2181. Specify ports if you need to use a port that is not 2181.

The chroot parameter identifies an application within a ZooKeeper ensemble. The parameter allows multiple applications to use a single ZooKeeper ensemble for high availability. For the Guidewire Solr Extension included with PolicyCenter, use pc as the value for chroot. Although you can use any value and although the parameter is optional for stand-alone instances, Guidewire recommends chroot with the value pc.

Configure solrserver-config.xml for high availability

Procedure

  1. Start Guidewire Studio.
  2. In the Project window, navigate to configuration > config > solr.
  3. Open the solrserver-config.xml file in the editor.
  4. Define the zkhosts parameter to specify the hosts and ports of all members of a ZooKeeper ensemble.
    1. For the parameter value, specify the hosts and ports in a comma-separated list, using the following syntax:
      • value="zookeeperHost[:port][,zookeeperHost[:port]][,...]/chRoot"
    2. Set the type attribute on the solrserver element to "cloud".
    3. Specify ports if you need to use a port that is not 2181.
    4. Use pc as the value of chroot.

Example

The following example shows a typical configuration for a high availability cluster of Guidewire Solr Extension servers.

<solrserver name="cloud" type="cloud">
  <param name="zkhosts" value="zkserver1,zkserver2,zkserver3/pc"/>
</solrserver>

What to do next

Install ZooKeeper

Before you begin

Procedure

  1. Download version 3.4.x from the Apache ZooKeeper web site. Here, x is at least 13. Do not use ZooKeeper version 3.5 or higher.
  2. On each host where you want to run the Guidewire Solr Extension for high availability, create an installation directory to use as the ZooKeeper home directory.
    For example:
    On Unix
    /opt/zoo
    On Windows
    C:\opt\zoo
  3. Install the ZooKeeper software in the directory that you created. In addition, create a directory for the ZooKeeper server to store its data.
    For example:
    On Unix
    /opt/zoo/data
    On Windows
    C:\opt\zoo\data

What to do next

Configuring zoo.cfg

In zoo.cfg, you configure a single ZooKeeper member server. Each ZooKeeper member within an ensemble has its own copy of zoo.cfg. The zoo.cfg file configures a ZooKeeper server with its client port. PolicyCenter and the Guidewire Solr Extension connect through the client port to the ZooKeeper server and the ensemble. The file also lists the members of the ensemble, including their host names and ensemble ports. The members of the ensemble connect with each other through the ensemble ports.

Important parameters that you can specify in zoo.cfg include the following:
tickTime
Unit of time in milliseconds for heartbeats and other timing parameters.
initLimit
Timeout limit for followers to connect to a leader.
syncLimit
How far out of date a follower can be from a leader.
dataDir
Location to store the ZooKeeper coordination database, the transaction logs, and the myID file that specifies the ordinal number for the server within the ensemble. For example:
On Unix
/opt/zoo/data
On Windows
C:\opt\zoo\data
The directory stores only ZooKeeper data and client-uploaded data. The latter data includes the SolrCloud global configuration and the core configuration. Otherwise, the directory stores no Guidewire Solr Extension data.
dataLogDir
Location to store the transaction logs. If you do not specify dataLogDir, the server stores transaction logs in the directory specified by dataDir. To reduce latency on updates, use dataLogDir to specify a logging directory on a dedicated logging device.
clientPort
The port on which to listen for connection requests from PolicyCenter and Guidewire Solr Extension servers. The value for clientPort must match the port specified for the server in the zkhosts parameter of solrserver-config.xml. The port defaults to 2181 in both zoo.cfg and solrserver-config.xml.
server.N
For each member of a ZooKeeper ensemble, its host name, its ensemble port for peer communication, and its ensemble port for leader election. A member server finds its ensemble ports in the membership list by matching N with the numerical value in the myID file, located in the directory specified by dataDir.

The following example zoo.cfg file configures an ensemble of three SolrCloud servers in a ZooKeeper ensemble. Each server runs on its own host, so all members listen for connections from PolicyCenter through the same client port. The parameter clientPort is commented out in the example because port 2181 is the default. Because each member server is on its own host, they all use the same ensemble ports for peer communication and leader election. Such ensemble ports can include 2888 and 3888.

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/zoo/data
# clientPort=2181
server.1=gwsolr1:2888:3888
server.2=gwsolr2:2888:3888
server.3=gwsolr3:2888:3888

You can find a copy of zoo.cfg in Studio by navigating in the Project window to configuration > config > solr. Generally, the members servers in a ZooKeeper ensemble use the same copy of the file. Use the copy in Studio as your master copy and deploy it to the ZooKeeper home directory on the host of each member server.

myID file

Each ZooKeeper member server has an assigned ordinal number. The myID file for each member specifies its ordinal number.

The myID file is an ASCII text file that resides in the directory specified by the dataDir parameter in the zoo.cfg. For example, store each version of myID in the following directory:

On Unix
/opt/zoo/data
On Windows
C:\opt\zoo\data

The member server uses the ordinal number in its myID file to locate its assigned ensemble ports in its copy of zoo.cfg.

See also

For complete information on configuring and administering a SolrCloud cluster in a ZooKeeper ensemble, consult the Apache Solr and ZooKeeper documentation at the following respective links:

Install Guidewire Solr Extension in a cloud configuration

About this task

To install and launch Guidewire Solr Extension in a cloud configuration, you must first bootstrap or upload the Solr configuration document into ZooKeeper.

Some steps apply only if you are enabling authentication or SSL. All steps assume that the working directory is /opt/gwsolr/pc for gwzkcli and solr commands.

The example commands that use localhost apply to a testing environment only. A production environment uses multiple virtual machines to host Solr and associated ZooKeeper ensembles. For example, in a production environment, the commands that show localhost:2181/pc use a full ZooKeeper ensemble such as zoo1_host, zoo2_host,zoo3_host/pc.

Many example commands use system names such as my_Solr_Host that you replace with actual names in your environment.

Note: You can use the Solr Admin user interface for creating the collection and adding replicas instead of using CURL API calls.

Procedure

  1. Install Solr in /opt/gwsolr/pc.
  2. Install, configure, and start ZooKeeper ensemble.
  3. Create chroot value:
    bin\gwzkcli -zkhost localhost:2181 -cmd makepath /pc
  4. If you are using SSL, enable HTTPS for Solr:
    bin\gwzkcli -zkhost localhost:2181/pc -cmd clusterprop -name urlScheme -val https
  5. Upload solr.xml to ZooKeeper:
    bin\gwzkcli -zkhost localhost:2181/pc -cmd putfile /solr.xml solr\solr.xml
  6. Upload Solr document configuration to ZooKeeper for each collection (document):
    bin\gwzkcli -zkhost localhost:2181/pc -cmd upconfig -n pc_policy_active -d solr\policy_active
  7. If you are using authentication in Solr, create Solr credentials:
    You can provide the user name and password directly on the command line:
    bin\createuser --user solr --password SolrRocks! --solrhome solr
    Alternatively, if you do not specify the --password option, the command prompts for a password in a secure input field.
    If the solr/security.json file does not exist, creating Solr credentials creates this file.
    If the solr/security.json file does exist, creating Solr credentials generates a password hash value. In this case, you must perform the following steps:
    1. Copy the JSON text for the credentials that the createuser command wrote to the output stream.
    2. Open the solr/security.json file in a text editor.
    3. Paste the new credentials information into the credentials element.
    4. Save and close the solr/security.json file.
  8. If you are using authentication in Solr, upload Solr credentials to ZooKeeper:
    bin\gwzkcli -zkhost localhost:2181/pc -cmd putfile /security.json solr\security.json
  9. If you are using authentication in Solr, temporarily disable authentication with the command to follow.
    This step is a workaround for a known issue that the developer of Solr identifies as SOLR-9679.
    bin\solr auth disable -zkHost localhost:2181/pc
  10. If you are using authentication in Solr, enable authentication with Solr:
    bin\solr auth enable -prompt true -blockUnknown true -s solr -zkHost localhost:2181/pc
    The command in this step places the Solr credentials in plain text in the solr/basicAuth/conf folder. Guidewire recommends that you secure this file. Alternatively, the Solr credentials can be injected into Solr by defining SOLR_AUTHENTICATION_OPTS and commenting out any references to this variable in solr.in.cmd (.sh).
  11. Create a folder to host the cloud index.
    Do not use the default Solr home location, /opt/gwsolr/pc/solr.
    mkdir cloud\solr
    set SOLR_LOGS_DIR=..\cloud\logs
    If you run multiple Solr instances on a single workstation, use separate folders for each instance. For example:
    mkdir cloud\solr\node1
    set SOLR_LOGS_DIR=..\cloud\logs\node1
  12. Set the Java system property SOLR_OPTS to include the name of the Solr hosting server.
    set SOLR_OPTS="-DserverHost=my_Solr_Host"
  13. Start one cloud Solr instance to support further configuration.
    bin\solr start -cloud -s -z "localhost:2181/pc" -p 8983
  14. Apply the Guidewire recommendation to add an autoscaling rule that limits one replica for each collection or document for each Solr node.
    The following line uses the CURL language to show an example of this step. If you have enabled HTTPS, use the https protocol in the URL.
    curl -X POST http://localhost:8983/api/cluster/autoscaling -d '{"set-cluster-policy":[{"replica":"<2","shard":"#EACH","node":"#ANY"}]}'
  15. Create the policy collection with one replica, define whether the collection index is sharded, and how many shards a sharded index uses.
    For example:
    bin\solr create -c pc_policy_active -n pc_policy_active -shards 1 -replicationFactor 1
  16. To create additional replicas for high availability, start new Solr instances against an empty Solr home folder and issue a create replica command in the collections API.
    The following lines show how to start a single Solr instance on a separate virtual machine.
    mkdir cloud\solr
    set SOLR_LOGS_DIR=..\cloud\logs
    bin\solr start -cloud -s -z "localhost:2181/pc" -p 8983
    curl "https://localhost:8983/solr/admin/collections?action=ADDREPLICA&collection=pc_policy_active&shard=shard1"
    If you want to run multiple Solr instances on a single workstation for testing purposes, you must make separate log folders for each instance. The following example also shows the use of a ZooKeeper ensemble on the local machine.
    mkdir cloud\solr\node2
    set SOLR_LOGS_DIR=..\cloud\logs\node2
    bin\solr start -cloud -s -z "localhost:2181,localhost:2182,localhost:2183/pc" -p 9183
    curl "https://localhost:9183/solr/admin/collections?action=ADDREPLICA&collection=pc_policy_active&shard=shard1"
    As an alternative to this step, you can start all Solr nodes before creating the collection. If you use this option, specify the full set of replicas on creating the collection.

Communicating with Guidewire Solr Extension in high availability

The following diagram illustrates the communicative arrangement of a PolicyCenter cluster, a ZooKeeper ensemble, and a cluster of SolrCloud servers. An arrow in the diagram represents a connection route for opening a select port with a select protocol:
PolicyCenter Cluster has three servers. ZooKeeper Ensemble has three servers connected with the TCP protocol on ports 2888 and 3888. SolrCloud server cluster has four servers connected with the HTTP or HTTPS protocol on port 8983. The PolicyCenter cluster connects to the ZooKeeper ensemble with the TCP protocol on port 2181. The PolicyCenter cluster also connects to the SolrCloud server cluster with the HTTP or HTTPS protocol on port 8983. The SolrCloud server cluster connects to the ZooKeeper ensemble with the TCP protocol on port 2181.