API¶
Parsing and constructing queries¶
This is the core of the library. A parser and the syntax tree definition.
luqum.parser¶
The Lucene Query DSL parser based on PLY
- luqum.parser.parser = <ply.yacc.LRParser object>¶
This is the parser generated by PLY
Note: The parser by itself is not thread safe (because PLY is not). Use
luqum.thread.parse()
instead
luqum.threading¶
- luqum.thread.parse(input=None, lexer=None, debug=False, tracking=False)¶
A (hopefully) thread safe version of
luqum.parser.parse()
PLY is not thread safe because of its lexer state, but cloning it we can be thread safe. see: https://github.com/jurismarches/luqum/issues/72
luqum.tree¶
Elements that will constitute the parse tree of a query.
You may use these items to build a tree representing a query, or get a tree as the result of parsing a query string.
- class luqum.tree.Item(pos=None, size=None, head='', tail='')¶
Base class for all items that compose the parse tree.
An item is a part of a request.
- Parameters
- clone_item(**kwargs)¶
clone an item, but not its children !
This is particularly useful for the
visitor.TreeTransformer
pattern.- Parameters
kwargs (dict) – those item will be added to __init__ call. It’s a simple way to change some values of target item.
- property children¶
As base of a tree structure, an item may have children
- class luqum.tree.NoneItem(pos=None, size=None, head='', tail='')¶
This Item is a place holder, think to it as None.
It can be used, eg. to initialize an element childrens, until we feed in the real children.
- luqum.tree.NONE_ITEM = NoneItem()¶
an instanciation of NoneItem, as it is always the same
- class luqum.tree.SearchField(name, expr, **kwargs)¶
Indicate wich field the search expression operates on
eg: desc in
desc:(this OR that)
- Parameters
name (str) – name of the field
expr – the searched expression
- class luqum.tree.BaseGroup(expr, **kwargs)¶
Base class for group of expressions or field values
- Parameters
expr – the expression inside parenthesis
- class luqum.tree.Group(expr, **kwargs)¶
Group sub expressions
- class luqum.tree.FieldGroup(expr, **kwargs)¶
Group values for a query on a field
- class luqum.tree.Range(low, high, include_low=True, include_high=True, **kwargs)¶
A Range
- class luqum.tree.Term(value, **kwargs)¶
Base for terms
- Parameters
value (str) – the value
- is_wildcard()¶
- Return bool
True if value is the wildcard
*
- iter_wildcards()¶
list wildcards contained in value and their positions
- split_wildcards()¶
split term on wildcards
- has_wildcard()¶
- Return bool
True if value contains a wildcards
- class luqum.tree.Phrase(value, **kwargs)¶
A phrase term, that is a sequence of words enclose in quotes
- Parameters
value (str) – the value, including the quotes. Eg.
'"my phrase"'
- class luqum.tree.Regex(value, **kwargs)¶
A regex term, that is a sequence of words enclose in slashes
- Parameters
value (str) – the value, including the slashes. Eg.
'/my regex/'
- class luqum.tree.BaseApprox(term, degree=None, **kwargs)¶
Base for approximations, that is fuzziness and proximity
- class luqum.tree.Fuzzy(term, degree=None, **kwargs)¶
Fuzzy search on word
- Parameters
term (Word) – the approximated term
degree – the degree which will be converted to
decimal.Decimal
.
- class luqum.tree.Proximity(term, degree=None, **kwargs)¶
Proximity search on phrase
- Parameters
term (Phrase) – the approximated phrase
degree – the degree which will be converted to
int()
.
- class luqum.tree.Boost(expr, force, **kwargs)¶
A term for boosting a value or a group there of
- Parameters
expr – the boosted expression
force – boosting force, will be converted to
decimal.Decimal
- class luqum.tree.BaseOperation(*operands, **kwargs)¶
Parent class for binary operations are binary operation used to join expressions, like OR and AND
- Parameters
operands – expressions to apply operation on
- property children¶
children are left and right expressions
- class luqum.tree.BoolOperation(*operands, **kwargs)¶
Lucene Boolean Query.
This operation assumes that the query builder can utilize a boolean operator with three possible sections, must, should and must_not. If the UnknownOperationResolver is asked to resolve_to this operation, the query builder can utilize this operator directly instead of nested AND/OR. This also makes it possible to correctly support Lucene queries such as: “apples +bananas -vegetables”.
See also
- class luqum.tree.UnknownOperation(*operands, **kwargs)¶
Unknown Boolean operator.
Warning
This is used to represent implicit operations (ie: term:foo term:bar), as we cannot know for sure which operator should be used.
Lucene seem to use whatever operator was used before reaching that one, defaulting to AND, but we cannot know anything about this at parsing time…
See also
the
utils.UnknownOperationResolver
to resolve those nodes to OR and AND
- class luqum.tree.OrOperation(*operands, **kwargs)¶
OR expression
- class luqum.tree.AndOperation(*operands, **kwargs)¶
AND expression
- luqum.tree.create_operation(cls, a, b, op_tail=' ')¶
Create operation between a and b, merging if a or b is already an operation of same class
- Parameters
a – left operand
b – right operand
op_tail – tail of operation token
- class luqum.tree.Unary(a, **kwargs)¶
Parent class for unary operations
- Parameters
a – the expression the operator applies on
- class luqum.tree.UnaryOperator(a, **kwargs)¶
Base class for unary operators
- class luqum.tree.Plus(a, **kwargs)¶
plus, unary operation
- class luqum.tree.Not(a, **kwargs)¶
- class luqum.tree.Prohibit(a, **kwargs)¶
The negation
- class luqum.tree.OpenRange(a, include=True, **kwargs)¶
A range with only one bound.
- Parameters
a – the provided bound value
include (bool) – whether a is included
- class luqum.tree.From(a, include=True, **kwargs)¶
- class luqum.tree.To(a, include=True, **kwargs)¶
Transforming to Elastic Search queries¶
luqum.schema¶
- class luqum.elasticsearch.schema.SchemaAnalyzer(schema)¶
An helper that analyze ElasticSearch schema, to give you suitable options to use when transforming queries.
- Parameters
schema (dict) – the index settings as a dict.
- sub_fields()¶
return all known subfields
- query_builder_options()¶
return options suitable for
luqum.elasticsearch.visitor.ElasticsearchQueryBuilder
luqum.elasticsearch¶
- class luqum.elasticsearch.visitor.ElasticsearchQueryBuilder(default_operator='should', default_field='text', not_analyzed_fields=None, nested_fields=None, object_fields=None, sub_fields=None, field_options=None, match_word_as_phrase=False)¶
Query builder to convert a Tree in an Elasticsearch query dsl (json)
Warning
there are some limitations
mix of AND and OR on same level in expressions is not supported has this leads to unpredictable results (see this article)
for full text fields, zero_terms_query parameter of match queries is managed at best according to where the terms appears. Lucene would just remove fields with only stop words while this query builder have to retain all expressions, even if is only made of stop words. So in the case of an expression appearing in AND expression, it will be set to “all” while it will be set to “none” if it’s part of a OR on AND NOT to avoid influencing the rest of the query. Some edge case like having all terms resolving to stop words may however lead to different results than string_query..
- __init__(default_operator='should', default_field='text', not_analyzed_fields=None, nested_fields=None, object_fields=None, sub_fields=None, field_options=None, match_word_as_phrase=False)¶
- Parameters
default_operator – to replace blank operator (MUST or SHOULD)
default_field – to search
not_analyzed_fields – field that are not analyzed in ES (do not forget to include eventual sub fields)
nested_fields –
dict contains fields that are nested in ES each nested fields contains either a dict of nested fields (if some of them are also nested) or a list of nesdted fields (this is for commodity)
exemple, a where record contains multiple authors, each with one name and multiple books. Each book has on title but multiple formats with on type each:
'author': { 'name': None, 'book': { 'format': ['type'], 'title': None } },
object_fields – list containing full qualified names of object fields. You may also use a spec similar to the one used for nested_fields. None, will accept all non nested fields as object fields.
sub_fields – list containing full qualified names of sub fields. None, will accept all non nested fields or object fields as sub fields.
field_options (dict) – allows you to give defaults options for each fields. They will be applied unless, overwritten by generated parameters. For match query, the match_type parameter modifies the type of match query.
match_word_as_phrase (bool) – if True, word expressions are matched using match_phrase instead of match. This options mainly keeps stability with 0.6 version. It may be removed in the future.
Note
some of the parameters above can be deduced from elasticsearch index configuration. see
luqum.elasticsearch.schema.SchemaAnalyzer.query_builder_options()
- __call__(tree)¶
Calling the query builder returns you the json compatible structure corresponding to the request tree passed in parameter
- Parameters
tree (luqum.tree.Item) – a luqum parse tree
- Return dict
Naming and explaining matches¶
luqum.naming¶
Support for naming expressions
In order to use elastic search named query, we need to be able to assign names to expressions and retrieve their positions in the query text.
This module adds support for that.
- luqum.naming.NAME_ATTR = '_luqum_name'¶
Names are added to tree items via an attribute named _luqum_name
- class luqum.naming.TreeAutoNamer(track_parents=False)¶
Helper for
auto_name()
- next_name(name)¶
Given name, return next name
- ::
>>> tan = TreeAutoNamer() >>> tan.next_name(None) 'a' >>> tan.next_name('aZ') 'aZa' >>> tan.next_name('azb') 'azc'
- visit_base_operation(node, context)¶
name is to be set on children of operations
- visit(node)¶
visit the tree and add names to nodes while tracking their path
- luqum.naming.auto_name(tree, targets=None, all_names=False)¶
Automatically add names to nodes of a parse tree, in order to be able to track matching.
We add them to top nodes under operations as this is where it is useful for ES named queries
- Return dict
association of name with the path (as a tuple) to a the corresponding children
- luqum.naming.matching_from_names(names, name_to_path)¶
Utility to convert a list of name and the result of auto_name to the matching parameter for
MatchingPropagator
- luqum.naming.element_from_path(tree, path)¶
Given a tree, retrieve element corresponding to path
- Parameters
tree (luqum.tree.Item) – luqum expression tree
path (tuple) – tuple representing top down access to a child
- Return luqum.tree.Item
target item
- class luqum.naming.MatchingPropagator(default_operation=<class 'luqum.tree.OrOperation'>)¶
Class propagating matching to upper elements based on known base element matching
- Parameters
default_operation (luqum.tree.Item) – tells how to treat UnknownOperation. Choose between
luqum.tree.OrOperation
andluqum.tree.AndOperation
- NEGATION_NODES = (<class 'luqum.tree.Not'>, <class 'luqum.tree.Prohibit'>)¶
A tuple of nodes types considered as NOT operations
- NO_CHILDREN_PROPAGATE = (<class 'luqum.tree.Range'>, <class 'luqum.tree.BaseApprox'>)¶
A tuple of nodes for which propagation is of no use
- OR_NODES = (<class 'luqum.tree.OrOperation'>,)¶
A tuple of nodes types considered as OR operations
- class luqum.naming.ExpressionMarker(track_new_parents=False, **kwargs)¶
A visitor to mark a tree based on elements belonging to a path or not
One intended usage is to add marker around nodes matching a request, by altering tail and head of elements
- mark_node(node, path, *info)¶
implement this in your own code, maybe altering the head / tail arguments
- generic_visit(node, context)¶
Default visitor function, called if nothing matches the current node.
It simply clone node and children
Utilities¶
luqum.visitor: Manipulating trees¶
Base classes to implement a visitor pattern.
- class luqum.visitor.TreeVisitor(track_parents=False)¶
Tree Visitor base class.
This class is meant to be subclassed, with the subclass implementing visitor methods for each Node type it is interested in.
By default, those visitor method should be named
'visit_'
+ class name of the node, converted to lower_case (ie: visit_search_node for a SearchNode class)[#tweakvisit]_.It’s up to the visit method of each node to recursively call children (or not) It may be done simply by calling the generic_visit method.
By default the generic_visit, simply trigger visit of subnodes, yielding no information.
If the goal is to modify the initial tree, to get a new modified copy use
TreeTranformer
instead.- Parameters
track_parents (bool) – if True the context will contain parents of current node as a list. It’s up to you to maintain this list in your own methods.
- visit(tree, context=None)¶
Traversal of tree
- Parameters
tree (luqum.tree.Item) – a tree representing a lucene expression
context (dict) – a dict with initial values for context
Note
the values in context, are not guaranteed to move up the hierachy, because we do copy of context for children to have specific values.
A trick you can use if you need values to move up the hierachy is to set a “global” key containing a dict, where you can store values.
- visit_iter(node, context)¶
Basic, recursive traversal of the tree.
- child_context(node, child, context, **kwargs)¶
Generate a context for children.
The context children is distinct from its parent context, so that visit in a branch does not affect others.
Note
If you need global parameters, a trick is to put them in dict in a “global” entry as we do a swallow copy of context, and not a deep one.
- Parameters
node (luqum.tree.Item) – parent node
child (luqum.tree.Item) – child node
context (dict) – parent context
- Return dict
child context
- generic_visit(node, context)¶
Default visitor function, called if nothing matches the current node.
It simply visit children.
- Parameters
node (luqum.tree.Item) – current node
context (dict) – context (aka local parameters received from parents)
- class luqum.visitor.TreeTransformer(track_new_parents=False, **kwargs)¶
A version of TreeVisitor that is aimed at obtaining a transformed copy of tree.
Note
It is far better to build a transformed copy, than to modify in place the original tree, as it is less error prone.
- Parameters
track_new_parents (bool) – do we want to track new parents in the context ?
- visit(tree, context=None)¶
Visit the tree, by default building a copy and returning it.
- Parameters
tree (luqum.tree.Item) – luqum expression tree
context – optional initial context
- child_context(node, child, context, **kwargs)¶
Generate a context for children.
The context children is distinct from its parent context, so that visit in a branch does not affect others.
Note
If you need global parameters, a trick is to put them in dict in a “global” entry as we do a swallow copy of context, and not a deep one.
- Parameters
node (luqum.tree.Item) – parent node
child (luqum.tree.Item) – child node
context (dict) – parent context
- Return dict
child context
- generic_visit(node, context)¶
Default visitor function, called if nothing matches the current node.
It simply clone node and children
- clone_children(node, new_node, context)¶
Helper to clone children.
Note
a children may generate more than one children or none, for flexibility but it’s up to the transformer to ensure everything is ok
- class luqum.visitor.PathTrackingMixin¶
It can be useful to compute path of an element (as tuple of index in parent children)
This mixin provides base components
- child_context(node, child, context, **kwargs)¶
Thanks to “path” and “position” in kwargs, we add the path of children
- visit(node, context=None)¶
visit the tree while tracking their path
- class luqum.visitor.PathTrackingVisitor(track_parents=False)¶
Path tracking version of TreeVisitor
- generic_visit(node, context)¶
Default visitor function, called if nothing matches the current node.
It simply visit children.
- Parameters
node (luqum.tree.Item) – current node
context (dict) – context (aka local parameters received from parents)
- class luqum.visitor.PathTrackingTransformer(track_new_parents=False, **kwargs)¶
Path tracking version of TreeTransformer
- clone_children(node, new_node, context)¶
Helper to clone children.
Note
a children may generate more than one children or none, for flexibility but it’s up to the transformer to ensure everything is ok
luqum.auto_head_tail: Automatic addition of spaces¶
It can be teadious to add spaces in a tree you generate programatically.
This module provide a utility to transform a tree so that it contains necessary head/tail for expression to be printable.
- class luqum.auto_head_tail.AutoHeadTail(track_new_parents=False, **kwargs)¶
This class implements a transformer so that hand built tree, can have reasonable values for head and tail on their items, in order for the expression to be printable.
- luqum.auto_head_tail.auto_head_tail = <luqum.auto_head_tail.AutoHeadTail object>¶
method to auto add head and tail to items of a lucene tree so that it is printable
luqum.pretty: Pretty printing¶
This module provides a pretty printer for lucene query tree.
- class luqum.pretty.Prettifier(indent=4, max_len=80, inline_ops=False)¶
Class to generate a pretty printer.
- luqum.pretty.prettify = <luqum.pretty.Prettifier object>¶
prettify function with default parameters
luqum.check: Checking for validity¶
- class luqum.check.CheckNestedFields(nested_fields, object_fields=None, sub_fields=None)¶
Visit the lucene tree to make some checks
In particular to check nested fields.
- Parameters
nested_fields – a dict where keys are name of nested fields, values are dict of sub-nested fields or an empty dict for leaf
object_fields – this is either None, in which case unknown object fields will be accepted, or a dict of sub-nested fields (like nested_fields)
- visit_phrase(node, context)¶
On phrase field, verify term is in a final search field
- visit_search_field(node, context)¶
On search field node, check nested fields logic
- visit_term(node, context)¶
On term field, verify term is in a final search field
- class luqum.check.LuceneCheck(zeal=0)¶
Check if a query is consistent
This is intended to use with query constructed as tree, as well as those parsed by the parser, which is more tolerant.
- Parameters
zeal (int) – if zeal > 0 do extra check of some pitfalls, depending on zeal level
- errors(tree)¶
List all errors
luqum.utils: Misc¶
Various utilities for dealing with syntax trees.
Include base classes to implement a visitor pattern.
- class luqum.utils.UnknownOperationResolver(resolve_to=None, add_head=' ')¶
Transform the UnknownOperation to OR or AND
- DEFAULT_OPERATION¶
alias of
AndOperation
- class luqum.utils.OpenRangeTransformer(merge_ranges=False, add_head=' ')¶
Transforms open ranges to normal Range objects, i.e.
>=foo -> [foo TO *] <bar -> [* TO bar}
When merge_ranges is set, this also merges open ranges in AND clauses:
>foo AND <=bar -> {foo TO bar] [foo TO *] AND [* TO bar] -> [foo TO bar]
The merging of open ranges is performed by collecting all open ranges, and merging any open range into any previously collected open range with an open bound. In other words, any open bounds are always merged into the first suitable. Additionally, matching is always done from left-to-right, so that this holds:
[a TO *] AND [b TO *] AND [* TO y] AND [* TO z] -> [a TO y] AND [b TO z]
Open ranges in OR and unknown clauses are not adjusted. Use :cls:`UnknownOperationResolver` to make sure that unknown operations are resolved first.
Ranges with none of the bounds set are left unadjusted. Additionally, the ranges must be direct siblings of the same parent. Ranges such as
[foo TO *]^2 AND [* TO bar]^2
are therefore not merged (though([foo TO *] AND [* TO bar])^2
would).
- luqum.utils.normalize_nested_fields_specs(nested_fields)¶
normalize nested_fields specification to only have nested dicts
- Parameters
nested_fields (dict) – dict contains fields that are nested in ES each nested fields contains either a dict of nested fields (if some of them are also nested) or a list of nesdted fields (this is for commodity)
- ::
>>> from unittest import TestCase >>> TestCase().assertDictEqual( ... normalize_nested_fields_specs( ... {"author" : {"books": ["name", "ref"], "firstname" : None }}), ... {"author" : {"books": {"name": {}, "ref": {}}, "firstname" : {} }})
- luqum.utils.flatten_nested_fields_specs(nested_fields)¶
normalize object_fields specification to only have a simple set
- Parameters
nested_fields (dict) – contains fields that are object in ES has a serie of nested dict. List are accepted as well for concisness.
- ::
>>> from unittest import TestCase >>> flatten_nested_fields_specs(None) set() >>> TestCase().assertEqual( ... flatten_nested_fields_specs(["author.name", "book.title"]), ... set(["author.name", "book.title"])) >>> TestCase().assertEqual( ... flatten_nested_fields_specs( ... {"book" : { "author": ["firstname", "lastname"], "title" : None }}), ... set(["book.author.firstname", "book.author.lastname", "book.title"]))
- luqum.utils.normalize_object_fields_specs(object_fields)¶
normalize object_fields specification to only have a simple set
- Parameters
object_fields (dict) – contains fields that are object in ES has a serie of nested dict. List are accepted as well for concisness. None, which means no spec, is returned as is.
- ::
>>> from unittest import TestCase >>> normalize_object_fields_specs(None) is None True >>> TestCase().assertEqual( ... normalize_object_fields_specs(["author.name", "book.title"]), ... set(["author.name", "book.title"])) >>> TestCase().assertEqual( ... normalize_object_fields_specs( ... {"book" : { "author": ["firstname", "lastname"], "title" : None }}), ... set(["book.author.firstname", "book.author.lastname", "book.title"]))