Handling duplicates in joins with the foreign key on the right

Duplicate occurrences of records from the primary table in a result can occur only for joins that have the foreign key on the right. If the foreign key for a join is on the left, the value for any primary instance is unique in the dependent table. A foreign key on the left relates either to:

  • A value of null, which matches no instance on the right
  • A single instance on the right

A query that you create by using join or outerJoin with the foreign key on the right of the join often fetches duplicate instances of the primary entity. For example, the following Gosu code finds users that have updated addresses.

var queryParent = Query.make(User)
var tableChild = queryParent.join(Address#UpdateUser)

For the previous example, if multiple Address instances relate to the same User, the result includes duplicates of that User instance. One instance exists in the result of the join query for each child object of type Address that relates to that instance.

Duplicate instances of the primary entity in a query result are desirable in some cases. For example, if you intend to iterate across the result and extract properties from each child object that matches the query, this set of rows is what you want. In cases for which you need to return a single row from the primary table, you must design and test your query to ensure this result.

For joins with the foreign key on the right, try to reduce duplicates on the primary table. The best way to reduce duplicates is to ensure that the secondary table only has one row that matches the entity on the primary table.

If you cannot eliminate duplicates in this way, other approaches can limit duplicates created because of a join:

  • Rewrite to use the subselect method approach. For joins with the key on the right, the query optimizer often performs better with subselect than with join or outerJoin. However, using the join method might perform better in some cases. For example, consider using join if the dependent entity includes predicates that are not very selective and return many results. For important queries, Guidewire recommends trying both approaches under performance testing. If you use join with the foreign key on the right, use the two-parameter method signature that includes the table name followed by the column name in the joined table.
    var queryParent = Query.make(User)
    queryParent.subselect(User#ID, InOperation.CompareIn, Note#Author)
  • Call the withDistinct method on Query to limit the results to distinct results from the join. Pass the value true as an argument to limit the query results to contain only a single row for each row of the parent table. To turn off this behavior later, call the method again and pass false as an argument. For example:
    var queryParent = Query.make(User)
    queryParent.withDistinct(true)
    queryParent.join(Note#Author)
  • Add predicates to the secondary table to limit what your query matches. For example, only match entities on the primary table if a related child object has a certain property with a specific value or range of values.

    The following example adds predicates after the join using properties on the joined table:

    var queryParent = Query.make(User)
    var tableChild = queryParent.join(Note#Author)
    tableChild.compare(Note#CreateTime, GreaterThanOrEquals, 
                       DateUtil.addDays(DateUtil.currentDate(), -5))

See also