Improving limited document search functionality

I’ll try to keep it short and sweet. I had an e-commerce project which had a search screen for products. It can search on multiple categories per product (some ANDed, some ORed), price ranges, different features (some ANDed, some ORed) and so on and so forth. This kind of advanced functionality is impossible to achieve with org.moqui.search.SearchServices.search#DataDocuments because it does not allow for flexible query criteria the way the entity-engine does. For example a simple query such as “Product X belongs to category Y AND category Z” cannot work because you need to AND two nested queries of the same path.

To solve this problem I created a search service that take the entire queryMap as a parameter. So it only takes care of pagination, highlighting and sorting, but beyond that gives the power to the user to specify the detailed query.

This introduced a new problem. How to easily build a complex query that might have lots of copy-paste patterns for repetitive things (e.g. dateCondition fromDate thruDate and so on…)

The solution was to write a mini-library to sort of incrementally build the query. The problem with elastic search is that some queries are leaf (match, exists, etc …) and some queries are composite (should, must, bool, etc …). So to be able to do this correctly, I had to implement a tree-based QueryBuilder that allows for infinite sub-queries to build the final query.

Once I’ve done all of this it is now much easier to build sophisticated queries with simple semantics.

The question here is, do you think this is useful to incorporate into the framework or is it better left to each developer to fgure out their own solutions?

1 Like

I would be interested in seeing what you’ve done. A fluent or other convenient sort of class could possibly be much nicer than the plain nested Maps/Lists in Groovy syntax or plain JSON (those are the two formats it would most directly compete with).

FWIW, I’m not a fan of the ElasticSearch Java API… it’s a big mess of ugly builders and such, and while the JSON syntax (or the equivalent in Groovy Map/List syntax) is a bit messy it’s still better than the Java API in my opinion. I think a better Java API could be created, perhaps something that is only a subset of the full ES API with simpler and(), or(), and not() sorts of methods that build the underlying Map/List structures (with match and bool and all).

To date I haven’t done enough work with ES to justify the effort, and aside from aggregations the search#DataDocuments service has been adequate between the queryString and nestedQueryMap (where the value for the null key is for the top level query).

As a test case for API alternatives the SalesReportServices.xml file has the most complex OOTB ElasticSearch API use (note that it uses the ElasticFacade.ElasticClient directly, not the search#DataDocuments service).

Hi @jonesde

So first, I completely agree with ES java API, I hated it the moment I looked at it and that was the reason I turned to writing my own little thing. Second, this is a WIP and it barely does enough for my usecase but it’s definitely something to improve or revise. So I will share below each code snippet with brief of what it does:

The alternative service

Pretty much a rewrite of search#DataDocuments but allows me to pass the full query queryMap

    <service verb="query" noun="DataDocuments">
        <description>
            Implements roughly the same logic as
            org.moqui.search.SearchServices.search#DataDocuments but instead of
            building the query automatically, it allows the user to input the
            elastic query manually (queryMap) for more flexibility and control.

            This should be the lowest level search service for Documents and
            more services should be built on top of it.
        </description>
        <in-parameters>
            <parameter name="indexName" required="true" />
            <parameter name="documentType">
                <description>
                    The ElasticSearch document type. For DataDocument based docs
                    this is the dataDocumentId.
                </description>
            </parameter>
            <parameter name="queryMap" type="Map" required="true" />
            <parameter name="orderByFields" type="List">
                <description>
                    prefix with '+' or nothing for ascending and '-' for descending
                </description>
                <parameter name="orderByField"/>
            </parameter>
            <parameter name="highlightFields" type="List">
                <parameter name="highlightField"/>
            </parameter>
            <parameter name="pageIndex" type="Integer" default="0" />
            <parameter name="pageSize" type="Integer" default="20" />
            <parameter name="pageNoLimit" type="Boolean" default="false" />
            <parameter name="flattenDocument" type="Boolean" default="true" />
            <parameter name="clusterName" default-value="default" />
        </in-parameters>
        <out-parameters>
            <parameter name="documentList" type="List">
                <parameter name="document" type="Map" />
            </parameter>
            <parameter name="documentListCount" type="Integer" />
            <parameter name="documentListPageIndex" type="Integer" />
            <parameter name="documentListPageSize" type="Integer" />
            <parameter name="documentListPageMaxIndex" type="Integer" />
            <parameter name="documentListPageRangeLow" type="Integer" />
            <parameter name="documentListPageRangeHigh" type="Integer" />
        </out-parameters>
        <actions><script><![CDATA[
            import java.math.RoundingMode
            import java.util.stream.Collectors
            import org.moqui.impl.context.ElasticFacadeImpl

            def elasticClient = ec.factory.elastic.getClient(clusterName)
            if (!elasticClient) {
                def warningMsg = "No Elastic Client found for cluster name ${clusterName}, not running search"
                ec.message.addMessage(warningMsg, 'danger')
                return
            }
            if (!elasticClient.indexExists(indexName)) {
                ec.logger.warn("Tried to search with indexName ${indexName} that does not exist, returning empty list")
                return [
                    documentListCount: 0,
                    documentListPageIndex: pageIndex,
                    documentListPageSize: pageSize,
                    documentListPageMaxIndex: 0,
                    documentListPageRangeLow: 0,
                    documentListPageRangeHigh: 0
                ]
            }

            def fromOffset = pageNoLimit ? 0 : pageIndex * pageSize
            def sizeLimit = pageNoLimit ? 10000 : pageSize
            def hasHighlights = highlightFields != null && highlightFields.size() > 0
            def searchMap = [query: queryMap, from: fromOffset, size: sizeLimit, track_total_hits: true]

            if (hasHighlights) {
                searchMap.highlight = highlightFields.stream().collect(Collectors.toMap({it}, {[:]}))
            }
            if (orderByFields) {
                searchMap.sort = orderByFields.stream()
                        .map { it.trim() }
                        .map { field ->
                            def firstChar = field.charAt(0)
                            def hasSortChar = ['-', '+'].stream().anyMatch { it == firstChar }
                            def isAscending = firstChar != '-'
                            def pureField = hasSortChar ? field.substring(1) : field
                            return isAscending ? pureField : [(pureField): 'desc']
                        }.collect(Collectors.toList())
            }

            def index = documentType
                    ? documentType.split(',').collect({ ElasticFacadeImpl.ddIdToEsIndex(it) }).join(',')
                    : indexName
            Map validateRespMap = elasticClient.validateQuery(index, queryMap, true)
            if (validateRespMap != null) {
                ec.message.addMessage("Invalid search: ${queryMap}", 'danger')
                documentListCount = 0
                return
            }
            def hitsMap = elasticClient.search(index, searchMap).hits
            def hitsList = hitsMap.hits

            def documentList = hitsList.stream().map { hit ->
                def document = flattenDocument ? flattenNestedMap(hit._source) : hit._source
                document._index = hit._index
                document._id = hit._id
                document._version = hit._version
                document._type = hit._type == null || '_doc' == hit._type
                        ? ElasticFacadeImpl.esIndexToDdId(_index)
                        : hit._type
                if (hasHighlights) {
                    document.highlights = hit.highlight
                }
                return document
            }.collect(Collectors.toList())

            def totalResults = hitsMap.total.value
            def maxIndex = (totalResults - 1.0).divide(pageSize, 0, RoundingMode.DOWN)
            def highRange = (pageIndex * pageSize) + pageSize
            def hasMore =  highRange < totalResults
            def checkedHighRange = hasMore ? highRange : totalResults

            return [
                documentList: documentList,
                documentListCount: totalResults,
                documentListPageIndex: pageIndex,
                documentListPageSize: pageSize,
                documentListPageMaxIndex: maxIndex,
                documentListPageRangeLow: pageIndex * pageSize + 1,
                documentListPageRangeHigh: checkedHighRange
            ]
        ]]></script></actions>
    </service>

Query Builder

This class is the one used to build the queries. It allows building to both json or Map formats

package com.pythys.elastic

import groovy.json.JsonOutput
import groovy.transform.CompileStatic
import javax.annotation.Nullable

@CompileStatic
public class QueryBuilder {

    private CompositeQuery query
    private CompositeQuery currentQuery

    private QueryBuilder() {
        this.query = new CompositeQuery(CompositeQuery.QueryType.BOOL)
        this.currentQuery = this.query
    }

    private QueryBuilder buildCompositeQuery(
            Closure subBuild,
            CompositeQuery.QueryType queryType) {
        buildCompositeQuery(subBuild, queryType, [:])
    }

    private QueryBuilder buildCompositeQuery(
            Closure subBuild,
            CompositeQuery.QueryType queryType,
            Map args) {
        def childQuery = new CompositeQuery(queryType, args)
        def parentQuery = this.currentQuery
        this.currentQuery = childQuery
        subBuild(this)
        parentQuery.add(this.currentQuery)
        this.currentQuery = parentQuery
        return this
    }

    private QueryBuilder buildLeafQuery(LeafQuery leafQuery) {
        this.currentQuery.add(leafQuery)
        return this
    }

    public static QueryBuilder query() {
        return new QueryBuilder()
    }

    public QueryBuilder bool(Closure subBuild) {
        buildCompositeQuery(subBuild, CompositeQuery.QueryType.BOOL)
    }

    public QueryBuilder must(Closure subBuild) {
        buildCompositeQuery(subBuild, CompositeQuery.QueryType.MUST)
    }

    public QueryBuilder mustNot(Closure subBuild) {
        buildCompositeQuery(subBuild, CompositeQuery.QueryType.MUST_NOT)
    }

    public QueryBuilder should(Closure subBuild) {
        buildCompositeQuery(subBuild, CompositeQuery.QueryType.SHOULD)
    }

    public QueryBuilder nested(Map args, Closure subBuild) {
        buildCompositeQuery(subBuild, CompositeQuery.QueryType.NESTED, args)
    }

    public QueryBuilder exists(String field) {
        buildLeafQuery(new LeafQuery(LeafQuery.QueryType.EXISTS, 'field', field))
    }

    public QueryBuilder match(String field, String value) {
        buildLeafQuery(new LeafQuery(LeafQuery.QueryType.MATCH, field, value))
    }

    public QueryBuilder range(String field,
            @Nullable String fromValue,
            @Nullable String toValue) {
        buildLeafQuery(new LeafQuery(LeafQuery.QueryType.RANGE, field, fromValue, toValue))
    }

    public QueryBuilder queryString(String query) {
        buildLeafQuery(new LeafQuery(LeafQuery.QueryType.QUERY_STRING, null, query))
    }

    public QueryBuilder conditionDate(String fromFieldName,
            String toFieldName,
            String timestamp) {
        bool {
            must {
                bool {
                    should {
                        range(fromFieldName, null, timestamp)
                        bool {
                            mustNot {
                                exists(fromFieldName)
                            }
                        }
                    }
                }
                bool {
                    should {
                        range(toFieldName, timestamp, null)
                        bool {
                            mustNot {
                                exists(toFieldName)
                            }
                        }
                    }
                }
            }
        }
    }

    public QueryBuilder conditionOr(String field, List<String> values) {
        bool {
            should {
                values.each { value ->
                    match(field, value)
                }
            }
        }
    }

    public Map toMap() {
        if (this.currentQuery != this.query) {
            throw new RuntimeException("You cannot build inside an inner query")
        } else {
            return this.query.build()
        }
    }

    public String json() {
        return JsonOutput.toJson(toMap())
    }
}

ElasticQuery

This is the parent of both leaf and composite queries whom are separated because they work differently

package com.pythys.elastic

import groovy.transform.CompileStatic

@CompileStatic
public interface ElasticQuery {
    public Map build()
}

CompositeQuery

package com.pythys.elastic

import groovy.transform.CompileStatic
import java.util.stream.Collectors

@CompileStatic
public class CompositeQuery implements ElasticQuery {
    public enum QueryType {
        BOOL("bool"),
        MUST("must"),
        MUST_NOT("must_not"),
        NESTED("nested"),
        SHOULD("should")

        private String type
        private QueryType(String type) {
            this.type = type
        }
        public String type() {
           return this.type
        }
    }

    private QueryType queryType;
    private Map args
    private List<ElasticQuery> subQueries

    public CompositeQuery(QueryType queryType) {
        this(queryType, [:])
    }

    public CompositeQuery(QueryType queryType, Map args) {
        this.queryType = queryType
        this.args = args
        this.subQueries = []
    }

    public void add(ElasticQuery query) {
        subQueries.add(query)
    }

    private List buildListSubquery(List<ElasticQuery> subList) {
        subList.stream().map { it.build() }.collect(Collectors.toList())
    }

    private Map buildMapSubquery(List<ElasticQuery> subList) {
        subList.stream()
                .flatMap { it.build().entrySet().stream() }
                .collect(Collectors.toMap(
                        {Map.Entry e -> e.getKey()},
                        {Map.Entry e -> e.getValue()}))
    }

    public Map build() {
        def subQuery = queryType != QueryType.BOOL
                ? buildListSubquery(subQueries)
                : buildMapSubquery(subQueries)
        def checkedQuery = queryType != QueryType.NESTED
                ? subQuery
                : [score_mode: 'avg', query: [bool: [must: subQuery]]].plus(args)
        return [(this.queryType.type()): checkedQuery]
    }
}

LeafQuery

package com.pythys.elastic

import groovy.transform.CompileStatic

@CompileStatic
public class LeafQuery implements ElasticQuery {
    public enum QueryType {
        EXISTS("exists"),
        MATCH("match"),
        QUERY_STRING("query_string"),
        RANGE("range")

        private String type
        private QueryType(String type) {
            this.type = type
        }
        public String type() {
           return this.type
        }
    }

    private QueryType queryType
    private String field
    private String value
    private String fromValue
    private String toValue

    public LeafQuery(QueryType queryType, String field, String value) {
        this.queryType = queryType
        this.field = field
        this.value = value
    }

    public LeafQuery(QueryType queryType,
            String field,
            String fromValue,
            String toValue) {
        this.queryType = queryType
        this.field = field
        this.fromValue = fromValue
        this.toValue = toValue
    }

    public Map build() {
        def fieldValue
        switch (queryType) {
            case QueryType.QUERY_STRING:
                fieldValue = [
                    query: value,
                    lenient: true,
                    time_zone: TimeZone.default.getID()
                ]
                break
            case [QueryType.EXISTS, QueryType.MATCH]:
                fieldValue = value
                break
            case QueryType.RANGE:
                fieldValue = buildRangeQuery(fromValue, toValue)
                break
        }
        def checkedValue = field ? [(field): fieldValue] : fieldValue
        return [(queryType.type()): checkedValue]
    }

    private Map buildRangeQuery(from, to) {
        def rangeQuery = [:]
        if (fromValue) { rangeQuery.gte = fromValue }
        if (toValue) { rangeQuery.lte = toValue }
        return rangeQuery
    }

}

Example

This is an example of how to use the API. Note how you can mix code with the builder. You can even build things manually and mix that with the QueryBuilder.

def queryMap
QueryBuilder.query().with {
    queryMap = must {
        queryString(myStringHere)
        nested([path: 'categories'], {
            conditionOr('categories.productCategoryId', myListOfCategoryIds)
            conditionDate(
                 'categories.categoryFromDate',
                 'categories.categoryThruDate',
                 nowTimeLong)
        })
        if (someConditionHere) {
             match('productClassEnumId', '123')
        }
    }.toMap()
}

Hi @jonesde

Any feedback on the above? Is it in the right / wrong direction? Do you think it might be useful as a starting point?

Hello,

If this is still of interest, I think I found a nicer implementation by employing MNode since it’s already used in the framework in various places as a general purpose tree node data structure.

Some thoughts on this, FWIW…

Most of the Moqui API avoids using Groovy Closure except as an alternative to other methods. The fluent API approach with method chaining makes it possible to do things in Java, Groovy, Scala, by not relying on Groovy Closure.

The structure here seems to mostly match the ES query structure rather. That’s fine and offers some advantages over using nested Maps/Lists to create the JSON to send to ES. My doubt is that it does not offer enough value add over the alternative of using Groovy or other syntax for building nested maps/lists. In other words, an abstraction to simplify common parts of queries while allowing for literal ES things as needed would add more value, be more useful and result in simpler code using the API. This goes back to what I mentioned about common concepts like and, or, and not methods that would take care of the sometimes cumbersome bool/must/etc ES specific things.

1 Like

Hi @jonesde

Any suggestion on “where” / “how”. For example, Should we create a framework similar to the EntityFind set of classes and use EntityCondition constructs to allow AND / OR and nesting indefinitely? And should that framework completely remove all ES constructs and abstract them away behind that API?

I think I can move in that direction if you see this as useful / in the right direction. So I appreciate any guidance on this.

Hi @taher ,

Aside from the code examples for sales reports with more complex ElasticSearch queries I don’t have a lot of test cases to look at to say what might be most useful. For general principles, yes I think that mostly making it independent of ElasticSearch specific constructs might be helpful.

This is a violation of another general principle where an interface should be the most literal representation possible of what is going on underneath, but in the case of the ES API whether via Java or JSON part of the problems is the concepts used are a little obtuse and cumbersome. In other words, part of the goal would be to provide an interface that adds some common logical expression concepts like nested AND/OR that adds the needed underlying ES API structures.

Thinking about this sent me back to thoughts over time about a better fluent API for entity conditions. The current EntityConditionFactory stuff is pretty cumbersome. The XML Actions elements allow for nested conditions in a more literal way, but the Java/Groovy API has not to date. Here is what I threw together today (in the entity-cond-fluent branch of moqui-framework):

This might be a helpful example of what I was trying to describe with more of a fluent API vs a Groovy style Builder. There are some nice things about Groovy builders that basically allow for a mini-DSL (domain specific language). We may want that as well at some point, but for the most part I find fluent style APIs to be more clear and flexible. They can be used in any Java-based language instead of just in Groovy, and it is easier to get references to interim objects (like what the or() and and() methods in the fluent interface above return). That last one is a two edged sword because one nice thing about a Groovy builder is you can mix in logic more easily so you more often don’t need such interim references.

Hi @taher, I’m curious to know if you worked on this any further?

Hello,

We didn’t yet work on it because it would be a little extra work to convert the groovy API into Java, however, you are welcomed to use the code snippets in here to help you out.

1 Like