First of all, I created a document model in C# named EsOrganisation with some basic fields:
[ElasticsearchType(Name = "organisation")] public class EsOrganisation { public Guid Id { get; set; } public DateTimeOffset CreatedDate { get; set; } public DateTimeOffset? UpdatedDate { get; set; } public int OrganisationTypeId { get; set; } public string OrganisationName { get; set; } public List<string> OrganisationAliases { get; set; } public List<string> OrganisationKeywords { get; set; } public List<int> Products { get; set; } }
Then I also created a factory to retrieve the Nest.ElasticClient, to simplify just have in mind that when I call to client.SearchAsync() I have already instantiated and prepared it.
Structured vs Unstructured Search
Structured or Unstructured Search refers as to how are the filters applied, Structured search refers to data like dates, times or numbers which can have a range or an absolute value in the search and the matches are either yes or no, but can’t be partially a match. Strings can also be structured like in a post labels, either you have the label or you don’t. Unstructured search then is about partial matches and that’s where score comes into play to determine the relevancy of the match.
Adding pagination
var skipAmount = 20; var takeAmount = 10; var q1 = await client.SearchAsync<EsOrganisation>(s => s .From(skipAmount) .Size(takeAmount) );
Filtering by integer fields
// Search for documents that have a certain productId var q2 = await client.SearchAsync<EsOrganisation>(s => s.Size(pageSize) .Query(q => q.Term(c => c.Field(p => p.Products).Value(3))) ); // Search for documents included in an array of productIds (1,2,3,4) var q3 = await client.SearchAsync<EsOrganisation>(s => s.Size(pageSize) .Query(q => q.Terms(c => c.Field(p => p.Products).Terms(1, 2, 3, 4))) ); // or var myList = new List<int>() {1, 2, 3, 4}; var q4 = await client.SearchAsync<EsOrganisation>(s => s.Size(pageSize) .Query(q => q.Terms(c => c.Field(p => p.Products).Terms(myList))) );
Filtering by dates
// Date range: year 2017 var d1 = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => q.DateRange(r => r .Field(f => f.CreatedDate) .GreaterThanOrEquals(new DateTime(2017, 01, 01)) .LessThan(new DateTime(2018, 01, 01)) )) );
Filtering strings – Unstructured queries
Unstructured queries allow for partial matches, which is counted into the score to determine who matches better. Match(), Prefix() and MatchPhrasePrefix() are all unstructured queries.
// Match exact word (one of the searched words or more) var t1 = await client.SearchAsync<EsOrganisation>(s => s .Query(q => q.Match(m => m.Field(f => f.OrganisationName) .Query("one two three"))) ); // starts with, only accepts one value, doesnt work if supplied with more than one word var t3 = await client.SearchAsync<EsOrganisation>(s => s .Query(q => q.Prefix(m => m.Field(f => f.OrganisationName) .Value("one") //.Value("one two") <- doesn't work )) ); // exact match, last word can be prefixed var t4 = await client.SearchAsync<EsOrganisation>(s => s .Query(q => q.MatchPhrasePrefix(m => m.Field(f => f.OrganisationName) .Query("one two thr"))) ); // words can be separated/disordered by amount of changes (slops) var t5 = await client.SearchAsync<EsOrganisation>(s => s .Query(q => q.MatchPhrasePrefix(m => m.Field(f => f.OrganisationName) .Slop(5) .Query("three one two"))) ); // limit max found (same as Size() but executed earlier, probably can help with performance?) var t6 = await client.SearchAsync<EsOrganisation>(s => s .Query(q => q.MatchPhrasePrefix(m => m.Field(f => f.OrganisationName) .MaxExpansions(takeAmount) .Query("one two three"))) );
Boolean queries
Boolean queries are composed queries in which there are more than one criteria and the sum of such criteria is done with ANDs, ORs and NOTs operators.
When creating Boolean queries we can add filters to it, a filter is essentially the same as a Must() query without adding the results into the score, allowing the score calculation to be quicker and the search to consume less resources. So try to add structured conditions into a filter while unstructured ones into a Must() that can calculate a score.
Operators
&&: AND
||: OR
!: NOT
+: filter. used to set this criteria as filter-type (not to be considered to calculate score)
Using ANDs and ORs inside a Query with operators:
var s4 = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => +q.Terms(c => c.Field(p => p.Products).Terms(products)) && ( q.Match(m => m.Field(f => f.OrganisationName).Query(query)) || q.Match(m => m.Field(f => f.OrganisationAliases).Query(query))) && !q.Match(m => m.Field(f => f.OrganisationKeywords).Query(query)) ) );
Extracting the search filters
Which can be useful when you want to reuse filters or dynamically build a query
var productFilter = new QueryContainerDescriptor<EsOrganisation>() .Terms(c => c.Field(p => p.Products).Terms(products)); var matchNameFilter = new QueryContainerDescriptor<EsOrganisation>() .Match(m => m.Field(f => f.OrganisationName).Query(query)); var matchAliasFilter = new QueryContainerDescriptor<EsOrganisation>() .Match(m => m.Field(f => f.OrganisationName).Query(query)); var matchKeywordFilter = new QueryContainerDescriptor<EsOrganisation>() .Match(m => m.Field(f => f.OrganisationName).Query(query)); var s3 = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => +productFilter && (matchNameFilter || matchAliasFilter) && !matchKeywordFilter) );
Some examples with the filters extracted, notice the behaviour of each command:
// This works, productFilter doesn't affect score and filters var b2 = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => q.Bool(b => b .Must(phrasePrefixNameFilter || phrasePrefixAliasFilter || phrasePrefixKeywordFilter) .Filter(productFilter) )) ); // All three are required as a MUST var b3 = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => q.Bool(b => b // This works as an AND .Must(phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter) )) ); // This works as non-exclusive filters just counting for the score (ORs), // if no Minimum was set everything would be included, just sorted by score // if more than one match -> more score (actually didn't seem to add more score in my tests but that's the theory) var b4 = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => q.Bool(b => b .Should(phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter) .MinimumShouldMatch(1) //match at least one, then sort by relevancy )) ); // If we add a filter it filters without affecting score var b4B = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => q.Bool(b => b .Should(phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter) .MinimumShouldMatch(1) .Filter(productFilter) )) );
Building queries dynamically
Now let’s imagine that you can’t determine how many filters or conditions you have until run-time, as the query depends on several conditions. While extracting the filters like done before is useful you also need to attach the ANDs, ORs, etc. in a dynamic way, so the symbols (&&, ||, !) wouldn’t help here as you don’t even know how many filters you may be attaching.
To achieve that, let’s make use of the Bool operator plus arrays of filters. Remember the Bool operator allows to set MUSTs, SHOULds and even other bools inside a bool. So we can effectively create a Bool -> Must AND Must (Bool -> (Should OR Should OR Should)).
Step 1: A simple array of filters
Let’s start with an example to attach an array of filters that you can increase or decrease dynamically.
Note: I’m using the filters extracted at point “Extracting the search filters“.
// Add filters to Array of filters var listOfFilters = new QueryContainer[] {phrasePrefixNameFilter, phrasePrefixAliasFilter, phrasePrefixKeywordFilter}; // Create Bool Query as object, set array of filters to the Should property var boolQuery1 = new BoolQuery { Name = "boolQuery", Should = listOfFilters, MinimumShouldMatch = 1, Filter = new QueryContainer [] { productFilter } }; // This works! var b8 = await client.SearchAsync<EsOrganisation>(sr => sr.Query(q => boolQuery1));
Step 2: A complex array with groups of ANDs and ORs
Let’s take that to another level of complexity, a foreach that will add filters to our query:
Remember: I created a model called EsOrganisation.
var client = _clientFactory.GetClient(); var listOfGroups = new List<Func<QueryContainerDescriptor<EsOrganisation>, QueryContainer>>(); foreach (var rmGroup in groups) { var filtersList = new List<QueryContainer>(); foreach (var rmFilter in rmGroup.Filters) { // This is a method I call to generate a filter dynamically, just think on the filters above as example of what it generates. var filter = SharedFilters.GetFilter(rmFilter.AggregateName, rmFilter.FieldName, rmFilter.OperationName, rmFilter.Value); filtersList.Add(filter); } var group = QueryBuilder.BuildGroupAsFuncOf<EsOrganisation>(filtersList.ToArray()); listOfGroups.Add(group); } var productFilter = SharedFilters.GetProductsFilter<EsOrganisation>(products); var results = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => q.Bool(b => b .Must(productFilter) .Should(listOfGroups) .MinimumShouldMatch(1) )) // Can be used to load only this field, can be of use to improve performance in the future .Source(s => s.Includes(f => f.Fields(o => o.Id))) ); var orgs = results.Documents; //--------------------------------------- // This is the function that joins the filters into a group of ANDs (which is a Bool Query with an array of MUSTs) public static Func<QueryContainerDescriptor<T>, QueryContainer> BuildGroupAsFuncOf<T>(QueryContainer[] filters) where T : class { return q => q.Bool(bl => new BoolQueryDescriptor<T>().Must(filters)); }
Some explanation:
If you start looking at the code from the inside to the outside, start at the inner foreach to see that I create a list of QueryContainers to store a list of filters, these filters will act as ANDs inside each group. Just outside that inner foreach the list of filters is added to the list of groups as a group of ANDs.
Once outside the first foreach I generate the main Bool Query, which includes each group inside a Should (ORs) as an array of Bool queries themselves. There is an extra filter I’m adding to the Must property as in my case absolutely all my dynamic queries have at least that one filter, then I set the MinimumShouldMatch to 1 and the Query is built.
Why the List of Func<>? When I created the list of filters as a list of QueryContainers and attached that to a Bool Query that worked. But when I tried to join together an array of Bool queries (and that’s what each group is, a Bool Query with the filters inside the Must) it didn’t seem to like it, I tried different approaches but none did work as the Should() method doesn’t allow a list of Bool queries as a parameter, instead, it allows a Func
As a side note, I also had this issue internally when creating filters and joining them together, so I ended up with this method to help me, it basically converts a BoolQueryDescriptor into a Func:
public Func<QueryContainerDescriptor<T>, QueryContainer> GetAsFuncOf<T>(BoolQueryDescriptor<T> descriptor) where T : class { return q => q.Bool(bl => descriptor); }
Queries that won’t work
Just some query attempts that won’t work, useful to know what you can’t do:
// Attempting an OR/AND between queries -> Fails var s4 = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => productFilter) // WARNING: This one is overridden by the second, DON'T DO THIS .Query(q => phrasePrefixNameFilter || phrasePrefixAliasFilter || phrasePrefixKeywordFilter) ); // Attempting an OR between Fields -> Fails var phrasePrefixInAllFields = new QueryContainerDescriptor<EsOrganisation>() .MatchPhrasePrefix(m => m .Field(f => f.OrganisationName) .Field(f => f.OrganisationAliases) // Again, this last Field method overrides the two previous ones, so THIS CAN'T BE DONE .Field(f => f.OrganisationKeywords) .Slop(2) .Query(query) ); var s6 = await client.SearchAsync<EsOrganisation>(sr => sr.Query(q => phrasePrefixInAllFields)); // WARNING: This won't work, second Must overrides the first!! var b1 = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => q.Bool(b => b .Must(productFilter) .Must(phrasePrefixNameFilter || phrasePrefixAliasFilter || phrasePrefixKeywordFilter) )) ); // This doesn't work as last Should overrides previous ones var b6 = await client.SearchAsync<EsOrganisation>(sr => sr .Query(q => q.Bool(b => b .Should(phrasePrefixNameFilter) .Should(phrasePrefixAliasFilter) .Should(phrasePrefixKeywordFilter) .MinimumShouldMatch(1) .Filter(productFilter) )) );
Conclusion: You can only have one Field, Must, Should or Query method unless you create subqueries.
Boosting a field
When performing unstructured queries, we can determine which fields have more relevancy than the others, just use Boost() to multiply the value of such match.
var phrasePrefixKeywordFilter = new QueryContainerDescriptor<EsOrganisation>() .MatchPhrasePrefix(m => m .Boost(3) // make this field three times more important when calculating score .Field(f => f.OrganisationKeywords) .Slop(2) .Query(query) );
Sorting
Sorting by numeric or date type fields is quite straight forward:
var qry = new SearchDescriptor<EsOrganisation>().From((pageNo - 1) * pageSize).Size(pageSize) .Query(q => q.Bool(b => b.Must(productsFilter, typeFilter) .Should(phrasePrefixNameFilter).MinimumShouldMatch(1))); var s4 = await client.SearchAsync<EsOrganisation>(qry.Sort(s => s.Descending(x => x.TypeId))); var s5 = await client.SearchAsync<EsOrganisation>(qry.Sort(s => s.Ascending(x => x.DatePublished))); var s6 = await client.SearchAsync<EsOrganisation>(qry.Sort(s => s.Ascending(x => x.WebsiteId)));
But when sorting text fields, that operation is forbidden, you must use the suffix keyword to use the keyword-type field (duplication of the field saved for this type of purposes):
var qry = new SearchDescriptor<EsOrganisation>().From((pageNo - 1) * pageSize).Size(pageSize) .Query(q => q.Bool(b => b.Must(productsFilter, typeFilter) .Should(phrasePrefixNameFilter).MinimumShouldMatch(1))); var s1 = await client.SearchAsync<EsOrganisation>(qry.Sort(s => s.Ascending(x => x.Name.Suffix("keyword"))));
Can i have more detail about this part –> Step 2: A complex array with groups of ANDs and ORs, few code is missing here i.e:
– foreach (var rmGroup in groups), here groups is missing
– var productFilter = SharedFilters.GetProductsFilter(products); missing
– var filter = SharedFilters.GetFilter(rmFilter.AggregateName, rmFilter.FieldName, rmFilter.OperationName, rmFilter.Value); missing
can you please share complete code snippet?
That last tidbit on using keyword was so f***ing useful. I wish the designers of Elastic and DSL had thought to make that a bit clearer.
Thanks for your article.
@Abdul: I honestly write these things mostly for my future self and didn’t think anyone would actually read it, but will have a look whenever I get a chance, also I’ve got better approaches now since when I wrote this post.