Search Modules¶

Indexing¶

class app.indexing.BaseDocumentPreprocessor(config: IndexConfig)[source]¶

Base class for document preprocessors.

Classes referenced in index configuration preprocess field, has to be derived from it.

abstract preprocess(document: dict[str, Any]) → FetcherResult[source]¶

Preprocess the document before data ingestion in Elasticsearch.

This can be used to make document schema compatible with the project schema or to add custom fields.

Returns:: a FetcherResult object:

the status can be used to pilot wether to index or not the document (even delete it)
the document is the transformed document

class app.indexing.BaseTaxonomyPreprocessor(config: IndexConfig)[source]¶

Base class for taxonomy entries preprocessors.

Classes referenced in index configuration preprocess field, has to be derived from it.

abstract preprocess(taxonomy: Taxonomy, node: TaxonomyNode) → TaxonomyNodeResult[source]¶

Preprocess the taxonomy entry before ingestion in Elasticsearch, and before synonyms generation

This can be used to make document schema compatible with the project schema or to add custom fields.

Returns:: a TaxonomyNodeResult object:

the status can be used to pilot wether to index or not the entry (even delete it)
the entry is the transformed entry

class app.indexing.DocumentProcessor(config: IndexConfig)[source]¶

DocumentProcessor is responsible of converting an item to index into a dict that is ready to be indexed by Elasticsearch.

from_result(result: FetcherResult) → FetcherResult[source]¶

Generate an item ready to be indexed by elasticsearch-dsl from a fetcher result.

Parameters:

result – the input data

Returns:

a new result with transformed data, ready to be indexed or removed or skipped.

In case of indexing or removal, the document always contains an id_ item

inputs_from_data(id_, processed_data: dict[str, Any]) → dict[str, Any][source]¶: Generate a dict with the data to be indexed in ES

app.indexing.generate_dsl_field(field: FieldConfig, supported_langs: Iterable[str]) → Field[source]¶

Generate Elasticsearch DSL field from a FieldConfig.

This will be used to generate the Elasticsearch mapping.

This is an important part, because it will define the behavior of each field.

Parameters:

field – the field to use as input
supported_langs – an iterable of languages (2-letter codes), used to know which sub-fields to create for text_lang and taxonomy field types

Returns:

the elasticsearch_dsl field

app.indexing.generate_index_object(index_name: str, config: IndexConfig) → Index[source]¶: Index configuration for project index, that will contain the data

app.indexing.generate_mapping_object(config: IndexConfig) → Mapping[source]¶: ES Mapping for project index, that will contain the data

app.indexing.generate_taxonomy_index_object(index_name: str, config: IndexConfig) → Index[source]¶: Index configuration for indexes containing taxonomies entries

app.indexing.generate_taxonomy_mapping_object(config: IndexConfig) → Mapping[source]¶: ES Mapping for indexes containing taxonomies entries

app.indexing.process_taxonomy_field(data: dict[str, Any], field: FieldConfig, taxonomy_config: TaxonomyConfig, split_separator: str) → dict[str, Any] | None[source]¶

Process data for a taxonomy field type.

There is not much to be done here, as the magic of synonyms etc. happens by ES itself, thanks to our mapping definition, and a bit at query time.

Parameters:

data – input data, as a dict
field – the field config
split_separator – the separator used to split the input field value, in case of multi-valued input (if field.split is True)

Returns:

the processed value

app.indexing.process_text_lang_field(data: dict[str, Any], input_field: str, split: bool, lang_separator: str, split_separator: str, supported_langs: set[str]) → dict[str, Any] | None[source]¶

Process data for a text_lang field type.

Generates a dict ready to be indexed by Elasticsearch, with a subfield for each language.

Parameters:

data – input data, as a dict
input_field – the name of the field to use as input
split – whether to split the input field value, using split_separator as separator
lang_separator – the separator used to separate the language code from the field name
split_separator – the separator used to split the input field value, in case of multi-valued input (if split is True)
supported_langs – a set of supported languages (2-letter codes), used to know which sub-fields to create

Returns:

the processed data, as a dict

Query¶

app.query.add_languages_suffix(analysis: QueryAnalysis, langs: list[str], config: IndexConfig) → QueryAnalysis[source]¶

Add correct languages suffixes to fields of type text_lang or taxonomy

This match in a langage OR another

app.query.boost_phrases(analysis: QueryAnalysis, boost: float, proximity: int | None) → QueryAnalysis[source]¶: Boost all phrases in the query

app.query.build_completion_query(q: str, taxonomy_names: list[str], langs: list[str], size: int, config: IndexConfig, fuzziness: int | None = 2)[source]¶

Build an elasticsearch_dsl completion Query.

Parameters:

q – the user autocomplete query
taxonomy_names – a list of taxonomies we want to search in
langs – the languages we want search in
size – number of results to return
config – the index configuration to use
fuzziness – fuzziness parameter for completion query

Returns:

the built Query

app.query.build_elasticsearch_query_builder(config: IndexConfig) → ElasticsearchQueryBuilder[source]¶: Create the ElasticsearchQueryBuilder object according to our configuration

app.query.build_search_query(params: SearchParameters, es_query_builder: ElasticsearchQueryBuilder) → QueryAnalysis[source]¶

Build an elasticsearch_dsl Query.

Parameters:

params – SearchParameters containing all search parameters
es_query_builder – the builder to transform the luqum tree to an elasticsearch query

Returns:

the built Search query

app.query.check_query(params: SearchParameters, analysis: QueryAnalysis) → None[source]¶: Run some sanity checks on the luqum query

app.query.compute_facets_filters(q: QueryAnalysis) → QueryAnalysis[source]¶

Extract facets filters from the query

For now it only handles SearchField under a top AND operation, which expression is a bare term or a OR operation of bare terms.

We do not verify if the field is an aggregation field or not, that can be done at a later stage

Returns:: a new QueryAnalysis with facets_filters attribute as a dictionary of field names and list of values to filter on

app.query.create_aggregation_clauses(config: IndexConfig, fields: set[str] | list[str] | None) → dict[str, Agg][source]¶: Create term bucket aggregation clauses for all fields corresponding to facets, as defined in the config

app.query.parse_query(q: str | None) → QueryAnalysis[source]¶: Begin query analysis by parsing the query.

app.query.parse_sort_by_field(sort_by: str | None, config: IndexConfig) → str | None[source]¶

Parse sort_by parameter, special handling is performed for text_lang subfield.

Parameters:

sort_by – the raw sort_by value
config – the index configuration to use

Returns:

None if sort_by is not provided or the final value otherwise

app.query.parse_sort_by_script(sort_by: str, params: dict[str, Any] | None, config: IndexConfig, index_id: str) → dict[str, Any][source]¶: Create the ES sort expression to sort by a script

app.query.resolve_open_ranges(analysis: QueryAnalysis) → QueryAnalysis[source]¶: We need to resolve open ranges to closed ranges before using elasticsearch query builder

app.query.resolve_unknown_operation(analysis: QueryAnalysis) → QueryAnalysis[source]¶: Resolve unknown operations in the query to a AND

Search¶

app.search.search(params: SearchParameters) → ErrorSearchResponse | SuccessSearchResponse[source]¶: Run a search

Charts¶

app.charts.build_charts(search_result: SuccessSearchResponse, index_config: IndexConfig, requested_charts: list[DistributionChart | ScatterChart] | None) → dict[str, dict[str, Any]][source]¶: Build and return vega charts representations for the given requested charts

app.charts.build_distribution_chart(chart: DistributionChart, values, index_config: IndexConfig)[source]¶: Return the vega structure for a Bar Chart Inspiration: https://vega.github.io/vega/examples/bar-chart/

app.charts.build_scatter_chart(chart_option: ScatterChart, search_result, index_config: IndexConfig)[source]¶: Build a scatter plot only for values from search_results (only values in the current page) TODO: use values from the whole search? Inspiration: https://vega.github.io/vega/examples/scatter-plot/

app.charts.empty_chart(chart_name)[source]¶

Return a responsive vega chart using signals and auto-size https://gist.github.com/donghaoren/023b2246569e8f0615017507b473e55e

Vega is used as a JSON visualization grammar Doc: https://vega.github.io/vega/docs/ It would have been possible to use higher lever vega-lite API, which is able to write vega specifications but it’s probably too much for our usage Inspired by: https://vega.github.io/vega/examples/bar-chart/

ES Scripts¶

Module to manage ES scripts that can be used for personalized sorting

app.es_scripts.get_script_id(index_id: str, script_id: str)[source]¶: We prefix scripts specific to an index with the index_id.

app.es_scripts.get_script_prefix(index_id: str)[source]¶: We prefix scripts specific to an index with the index_id.

app.es_scripts.sync_scripts(index_id: str, index_config: IndexConfig) → dict[str, int][source]¶: Resync the scripts between configuration and elasticsearch.

Taxonomy ES¶

Operations on taxonomies in ElasticSearch

Analyzers (Utils)¶

Defines some analyzers for the elesaticsearch fields.

app.utils.analyzers.get_autocomplete_analyzer(lang: str) → CustomAnalysis[source]¶: Return the search analyzer to use for the autocomplete field

app.utils.analyzers.get_taxonomy_indexing_analyzer(taxonomy: str, lang: str) → CustomAnalysis[source]¶: We want to index taxonomies terms as keywords (as we only store the id), but with a specific tweak: transform hyphens into underscores,

app.utils.analyzers.get_taxonomy_search_analyzer(taxonomy: str, lang: str, with_synonyms: bool) → CustomAnalysis[source]¶

Return the search analyzer to use for the taxonomized field

Parameters:

taxonomy – the taxonomy name
lang – the language code
with_synonyms – whether to add the synonym filter

app.utils.analyzers.get_taxonomy_stop_words_filter(taxonomy: str, lang: str) → TokenFilter | None[source]¶

Return the stop words filter to use for the taxonomized field analyzer

IMPORTANT: de-activated for now ! If we want to handle them, we have to remove them in synonyms, so we need the list.

app.utils.analyzers.get_taxonomy_synonym_filter(taxonomy: str, lang: str) → TokenFilter[source]¶: Return the synonym filter to use for the taxonomized field analyzer

app.utils.analyzers.number_of_fields(mapping: Mapping | dict[str, dict[str, Any]]) → int[source]¶: Return the number of fields in the mapping

Connection (Utils)¶

app.utils.connection.current_es_client()[source]¶: Return ElasticSearch default connection

Search Modules¶

Indexing¶

Query¶

Search¶

Facets¶

Charts¶

ES Scripts¶

Taxonomy ES¶

Analyzers (Utils)¶

Connection (Utils)¶

Search-a-licious

Navigation

Related Topics