API¶

Parsing and constructing queries¶

This is the core of the library. A parser and the syntax tree definition.

luqum.parser¶

The Lucene Query DSL parser based on PLY

luqum.parser.parser = <ply.yacc.LRParser object>¶: This is the parser generated by PLY

luqum.tree¶

Elements that will constitute the parse tree of a query.

You may use these items to build a tree representing a query, or get a tree as the result of parsing a query string.

class luqum.tree.Item¶

Base class for all items that compose the parse tree.

An item is a part of a request.

children¶: As base of a tree structure, an item may have children

class luqum.tree.SearchField(name, expr)¶

Indicate wich field the search expression operates on

eg: desc in desc:(this OR that)

Parameters:	name (str) – name of the field expr – the searched expression

children¶: the only child is the expression

class luqum.tree.BaseGroup(expr)¶

Base class for group of expressions or field values

Parameters:	expr – the expression inside parenthesis

children¶: the only child is the expression

class luqum.tree.Group(expr)¶: Group sub expressions

class luqum.tree.FieldGroup(expr)¶: Group values for a query on a field

class luqum.tree.Range(low, high, include_low=True, include_high=True)¶

A Range

Parameters:	low – lower bound high – higher bound include_low (bool) – wether lower bound is included include_high (bool) – wether higher bound is included

children¶: children are lower and higher bound expressions

class luqum.tree.Term(value)¶

Base for terms

Parameters:	value (str) – the value

is_wildcard()¶

Return bool:	True if value is the wildcard `*`

iter_wildcards()¶: list wildcards contained in value and their positions

split_wildcards()¶: split term on wildcards

has_wildcard()¶

Return bool:	True if value contains a wildcards

class luqum.tree.Word(value)¶

A single word term

Parameters:	value (str) – the value

class luqum.tree.Phrase(value)¶

A phrase term, that is a sequence of words enclose in quotes

Parameters:	value (str) – the value, including the quotes. Eg. `'"my phrase"'`

class luqum.tree.BaseApprox¶

Base for approximations, that is fuzziness and proximity

children¶: As base of a tree structure, an item may have children

class luqum.tree.Fuzzy(term, degree=None)¶

Fuzzy search on word

Parameters:	term (Word) – the approximated term degree – the degree which will be converted to `decimal.Decimal`.

class luqum.tree.Proximity(term, degree=None)¶

Proximity search on phrase

Parameters:	term (Phrase) – the approximated phrase degree – the degree which will be converted to `int()`.

class luqum.tree.Boost(expr, force)¶

A term for boosting a value or a group there of

Parameters:	expr – the boosted expression force – boosting force, will be converted to `decimal.Decimal`

children¶: The only child is the boosted expression

class luqum.tree.BaseOperation(*operands)¶

Parent class for binary operations are binary operation used to join expressions, like OR and AND

Parameters:	operands – expressions to apply operation on

children¶: children are left and right expressions

class luqum.tree.UnknownOperation(*operands)¶: Unknown Boolean operator.

Warning

This is used to represent implicit operations (ie: term:foo term:bar), as we cannot know for sure which operator should be used.

Lucene seem to use whatever operator was used before reaching that one, defaulting to AND, but we cannot know anything about this at parsing time…

See also

the utils.UnknownOperationResolver to resolve those nodes to OR and AND

class luqum.tree.OrOperation(*operands)¶: OR expression

class luqum.tree.AndOperation(*operands)¶: AND expression

luqum.tree.create_operation(cls, a, b)¶: Create operation between a and b, merging if a or b is already an operation of same class

class luqum.tree.Unary(a)¶

Parent class for unary operations

Parameters:	a – the expression the operator applies on

children¶: As base of a tree structure, an item may have children

class luqum.tree.Plus(a)¶: plus, unary operation

class luqum.tree.Not(a)¶

class luqum.tree.Prohibit(a)¶: The negation

Transforming to Elastic Search queries¶

luqum.schema¶

class luqum.elasticsearch.schema.SchemaAnalyzer(schema)¶

An helper that analyze ElasticSearch schema, to give you suitable options to use when transforming queries.

Parameters:	schema (dict) – the index settings as a dict.

sub_fields()¶: return all known subfields

query_builder_options()¶: return options suitable for luqum.elasticsearch.visitor.ElasticsearchQueryBuilder

luqum.elasticsearch¶

class luqum.elasticsearch.visitor.ElasticsearchQueryBuilder(default_operator='should', default_field='text', not_analyzed_fields=None, nested_fields=None, object_fields=None, sub_fields=None, field_options=None, match_word_as_phrase=False)¶

Query builder to convert a Tree in an Elasticsearch query dsl (json)

Warning

there are some limitations

mix of AND and OR on same level in expressions is not supported has this leads to unpredictable results (see this article)
for full text fields, zero_terms_query parameter of match queries is managed at best according to where the terms appears. Lucene would just remove fields with only stop words while this query builder have to retain all expressions, even if is only made of stop words. So in the case of an expression appearing in AND expression, it will be set to “all” while it will be set to “none” if it’s part of a OR on AND NOT to avoid influencing the rest of the query. Some edge case like having all terms resolving to stop words may however lead to different results than string_query..

__init__(default_operator='should', default_field='text', not_analyzed_fields=None, nested_fields=None, object_fields=None, sub_fields=None, field_options=None, match_word_as_phrase=False)¶

Parameters:

default_operator – to replace blank operator (MUST or SHOULD)
default_field – to search
not_analyzed_fields – field that are not analyzed in ES (do not forget to include eventual sub fields)
nested_fields –
dict contains fields that are nested in ES each nested fields contains either a dict of nested fields (if some of them are also nested) or a list of nesdted fields (this is for commodity)

exemple, a where record contains multiple authors, each with one name and multiple books. Each book has on title but multiple formats with on type each:
```
'author': {
    'name': None,
    'book': {
        'format': ['type'],
        'title': None
    }
},
```
object_fields – list containing full qualified names of object fields. You may also use a spec similar to the one used for nested_fields. None, will accept all non nested fields as object fields.
sub_fields – list containing full qualified names of sub fields. None, will accept all non nested fields or object fields as sub fields.
field_options (dict) – allows you to give defaults options for each fields. They will be applied unless, overwritten by generated parameters. For match query, the match_type parameter modifies the type of match query.
match_word_as_phrase (bool) – if True, word expressions are matched using match_phrase instead of match. This options mainly keeps stability with 0.6 version. It may be removed in the future.

Note

some of the parameters above can be deduced from elasticsearch index configuration. see luqum.elasticsearch.schema.SchemaAnalyzer.query_builder_options()

__call__(tree)¶

Calling the query builder returns you the json compatible structure corresponding to the request tree passed in parameter

Parameters:	tree (luqum.tree.Item) – a luqum parse tree
Return dict:

Utilities¶

luqum.naming: Naming query parts¶

Support for naming expressions

In order to use elastic search named query, we need to be able to assign names to expressions and retrieve their positions in the query text.

This module adds support for that.

luqum.naming.NAME_ATTR = '_luqum_name'¶: Names are added to tree items via an attribute named _luqum_name

class luqum.naming.TreeAutoNamer¶

generic_visit(node, parents=None, context=None)¶: Default visitor function, called if nothing matches the current node.

visit(node, parents=None, context=None)¶

Basic, recursive traversal of the tree.

Parma dict context:
Parameters:	parents (list) – the list of parents
	a dict of contextual variable for free use to track states while traversing the tree

luqum.naming.auto_name(tree)¶

Automatically add names to nodes of a parse tree.

We add them to terminal nodes : range, phrases and words, as this is where it is useful, but also on operations, to easily grab the group.

class luqum.naming.NameIndexer¶

generic_visit(node, parents=None, context=None)¶: Default visitor function, called if nothing matches the current node.

luqum.naming.name_index(tree)¶

Given a tree with names, give the index of each group in the string representation. also gives the node type.

Warning

this is not an efficient implementation, It will call str representation several times on each item, and seek for substrings.

see TreeNameIndexer

Parameters:	tree – a luqum parse tree
Return dict:	mapping each name to a (start position, length) tuple

luqum.naming.extract(expr, name, name_index)¶

extract named part of expression, using name_index

Parameters:	expr (str) – the lucene expression name (str) – name of the part to extract name_index (dict) – the dict obtained from `name_index()`

luqum.pretty: Pretty printing¶

This module provides a pretty printer for lucene query tree.

class luqum.pretty.Prettifier(indent=4, max_len=80, inline_ops=False)¶: Class to generate a pretty printer.

luqum.pretty.prettify = <luqum.pretty.Prettifier object>¶: prettify function with default parameters

luqum.check: Checking for validity¶

class luqum.check.CheckNestedFields(nested_fields, object_fields=None, sub_fields=None)¶

Visit the lucene tree to make some checks

In particular to check nested fields.

Parameters:	nested_fields – a dict where keys are name of nested fields, values are dict of sub-nested fields or an empty dict for leaf object_fields – this is either None, in which case unknown object fields will be accepted, or a dict of sub-nested fields (like nested_fields)

generic_visit(node, parents, context)¶: If nothing matches the current node, visit children

visit_phrase(node, parents, context)¶: On phrase field, verify term is in a final search field

visit_search_field(node, parents, context)¶: On search field node, check nested fields logic

visit_term(node, parents, context)¶: On term field, verify term is in a final search field

class luqum.check.LuceneCheck(zeal=0)¶

Check if a query is consistent

This is intended to use with query constructed as tree, as well as those parsed by the parser, which is more tolerant.

Parameters:	zeal (int) – if zeal > 0 do extra check of some pitfalls, depending on zeal level

errors(tree)¶: List all errors

luqum.check.sign()¶

Return a float with the magnitude (absolute value) of x but the sign of y.

On platforms that support signed zeros, copysign(1.0, -0.0) returns -1.0.

luqum.utils: Misc¶

Various utilities for dealing with syntax trees.

Include base classes to implement a visitor pattern.

class luqum.utils.LuceneTreeVisitor¶

Tree Visitor base class, inspired by python’s ast.NodeVisitor.

This class is meant to be subclassed, with the subclass implementing visitor methods for each Node type it is interested in.

By default, those visitor method should be named 'visit_' + class name of the node, converted to lower_case (ie: visit_search_node for a SearchNode class).

You can tweak this behaviour by overriding the visitor_method_prefix & generic_visitor_method_name class attributes.

If the goal is to modify the initial tree, use LuceneTreeTranformer instead.

visit(node, parents=None)¶: Basic, recursive traversal of the tree.

generic_visit(node, parents=None)¶: Default visitor function, called if nothing matches the current node.

class luqum.utils.LuceneTreeTransformer¶

A LuceneTreeVisitor subclass that walks the abstract syntax tree and allows modifications of traversed nodes.

The LuceneTreeTransormer will walk the AST and use the return value of the visitor methods to replace or remove the old node. If the return value of the visitor method is None, the node will be removed from its location, otherwise it is replaced with the return value. The return value may be the original node, in which case no replacement takes place.

generic_visit(node, parent=None)¶: Default visitor function, called if nothing matches the current node.

visit(node, parents=None)¶: Recursively traverses the tree and replace nodes with the appropriate visitor method’s return values.

class luqum.utils.LuceneTreeVisitorV2¶

V2 of the LuceneTreeVisitor allowing to evaluate the AST

It differs from py:cls:LuceneTreeVisitor because it’s up to the visit method to recursively call children (or not)

This class is meant to be subclassed, with the subclass implementing visitor methods for each Node type it is interested in.

By default, those visitor method should be named 'visit_' + class name of the node, converted to lower_case (ie: visit_search_node for a SearchNode class).

You can tweak this behaviour by overriding the visitor_method_prefix & generic_visitor_method_name class attributes.

If the goal is to modify the initial tree, use LuceneTreeTranformer instead.

visit(node, parents=None, context=None)¶

Basic, recursive traversal of the tree.

Parma dict context:
Parameters:	parents (list) – the list of parents
	a dict of contextual variable for free use to track states while traversing the tree

generic_visit(node, parents=None, context=None)¶: Default visitor function, called if nothing matches the current node.

class luqum.utils.UnknownOperationResolver(resolve_to=None)¶

Transform the UnknownOperation to OR or AND

DEFAULT_OPERATION¶: alias of luqum.tree.AndOperation

luqum.utils.normalize_nested_fields_specs(nested_fields)¶

normalize nested_fields specification to only have nested dicts

Parameters:	nested_fields (dict) – dict contains fields that are nested in ES each nested fields contains either a dict of nested fields (if some of them are also nested) or a list of nesdted fields (this is for commodity)

::

>>> from unittest import TestCase
>>> TestCase().assertDictEqual(
...     normalize_nested_fields_specs(
...         {"author" : {"books": ["name", "ref"], "firstname" : None }}),
...     {"author" : {"books": {"name": {}, "ref": {}}, "firstname" : {} }})

luqum.utils.flatten_nested_fields_specs(nested_fields)¶

normalize object_fields specification to only have a simple set

Parameters:	nested_fields (dict) – contains fields that are object in ES has a serie of nested dict. List are accepted as well for concisness.

::

>>> from unittest import TestCase
>>> flatten_nested_fields_specs(None)
set()
>>> TestCase().assertEqual(
...     flatten_nested_fields_specs(["author.name", "book.title"]),
...     set(["author.name", "book.title"]))
>>> TestCase().assertEqual(
...     flatten_nested_fields_specs(
...         {"book" : { "author": ["firstname", "lastname"], "title" : None }}),
...     set(["book.author.firstname", "book.author.lastname", "book.title"]))

luqum.utils.normalize_object_fields_specs(object_fields)¶

normalize object_fields specification to only have a simple set

Parameters:	object_fields (dict) – contains fields that are object in ES has a serie of nested dict. List are accepted as well for concisness. None, which means no spec, is returned as is.

::

>>> from unittest import TestCase
>>> normalize_object_fields_specs(None) is None
True
>>> TestCase().assertEqual(
...     normalize_object_fields_specs(["author.name", "book.title"]),
...     set(["author.name", "book.title"]))
>>> TestCase().assertEqual(
...     normalize_object_fields_specs(
...         {"book" : { "author": ["firstname", "lastname"], "title" : None }}),
...     set(["book.author.firstname", "book.author.lastname", "book.title"]))

API¶

Parsing and constructing queries¶

luqum.parser¶

luqum.tree¶

Transforming to Elastic Search queries¶

luqum.schema¶

luqum.elasticsearch¶

Utilities¶

luqum.naming: Naming query parts¶

luqum.pretty: Pretty printing¶

luqum.check: Checking for validity¶

luqum.utils: Misc¶

Navigation

Related Topics