configuration of indices.
A Search-a-licious instance only have one configuration file,
but is capable of serving multiple datasets
It provides a section for each index you want to create (corresponding to a dataset).
The key is the ID of the index that can be referenced at query time.
One index corresponds to a specific set of documents and can be queried independently.
If you have multiple indexes, one of those index must be designed as the default one,
see default_index
.
Each additional property must conform to the following schema
Type: objectThis object gives configuration for one index.
One index usually correspond to one dataset.
This is the configuration for the main index containing the data.
It's used to create the index in ElasticSearch, and configure its mappings
(along with the *fields* config)
Name of the index alias to use.
Search-a-licious will create an index using this name and an import date,
but alias will always point to the latest index.
The alias must not already exists in your ElasticSearch instance.
Number of shards to use for the index.
Shards are useful to distribute the load on your cluster.
(see index settings)
Number of replicas to use for the index.
More replica means more resiliency but also more disk space and memory.
(see index settings)
Name of the field to use for _id
.
it is mandatory to provide one.
If your dataset does not have an identifier field,
you should use a document preprocessor to compute one (see preprocessor
).
Name of the field containing the date of last modification,
in your indexed objects.
This is used for incremental updates using Redis queues.
The field value must be an int/float representing the timestamp.
Configuration of all fields we need to store in the index.
Keys are field names,
values contain the field configuration.
This is a very important part of the configuration.
Most of the ElasticSearch mapping will depends on it.
ElasticSearch will also use this configuration
to provide intended behaviour.
(see also Explain Configuration)
If you change those settings you will have to re-index all the data.
(But you can do so in the background).
Each additional property must conform to the following schema
Type: objectname of the field (must be unique
Type of the field
Supported field types in Search-a-Licious are:
* keyword: string values that won't be interpreted (tokenized).
Good for things like tags, serial, property values, etc.
* date: Date fields
* double, float, half_float, scaled_float:
different ways of storing floats with different capacity
* short, integer, long, unsigned_long :
integers (with different capacity: 8 / 16 / 32 bits)
* bool: boolean (true / false) values
* text: a text which is tokenized to enable full text search
* text_lang: like text, but with different values in different languages.
Tokenization will use analyzers specific to each languages.
* taxonomy: a field akin to keyword but
with support for matching using taxonomy synonyms and translations
(and in fact also a text mapping possibility)
* disabled: a field that is not stored nor searchable
(see [Elasticsearch help])
* object: this field contains a dict with sub-fields.
if required=True, the field is required in the input data
An entry that does not contains a value for this field will be rejected.
name of the input field to use when importing data
By default, Search-a-licious use the same name as the field name.
This is useful to index the same field using different types or configurations.
do we split the input field with split_separator
?
This is useful if you have some text fields that contains list of values,
(for example a comma separated list of values, like apple,banana,carrot).
You must set split_separator to the character that separates the values in the dataset.
Wether this field in included on default full text search.
If false
, the field is only used during search
when filters involving this field are provided
(as opposed to full text search expressions without any explicit field).
do we add an bucket aggregation to the elasticsearch query for this field.
It is used to return a 'faceted-view' with the number of results for each facet value,
or to generate bar charts.
Only valid for keyword, taxonomy or numeric field types.
the name of the taxonomy associated with this field.
It must only be provided for taxonomy field type.
separator to use when splitting values, for fields that have split=True
for text_lang
FieldType, the separator between the name of the field and the language code, ex: productnameit if langseparator=""
Used for vega charts. Use CSS color code.
Used for vega. Should be CSS color code.
Configuration of taxonomies,
that is collections of entries with synonyms in multiple languages.
See [Explain taxonomies](../explain-taxonomies)
Field may be linked to taxonomies.
It enables enriching search with synonyms,
as well as providing suggestions,
or informative facets.
Note: if you define taxonomies, you must import them using
[import-taxonomies command](../ref-python/cli.html#python3-m-app-import-taxonomies)
Configurations of taxonomies that this project will use.
No Additional ItemsConfiguration on how to fetch a particular taxonomy.
Name of the taxonomy
This is the name you will use in the configuration (and the API)
to reference this taxonomy
URL of the taxonomy.
The target file must be in JSON format
and follows Open Food Facts JSON taxonomy format.
This is a dict where each key correspond to a taxonomy entry id,
values are dict with following properties:
Other keys correspond to properties associated to this entry (eg. wikidata id).
Must be at least 1
characters long
Must be at least 1
characters long
Must be at most 2083
characters long
This is the configuration of
the ElasticSearch index storing the taxonomies.
All taxonomies are stored within the same index.
It enables functions like auto-completion, or field suggestions
as well as enrichment of requests with synonyms.
Name of the index alias to use.
Search-a-licious will create an index using this name and an import date,
but alias will always point to the latest index.
The alias must not already exists in your ElasticSearch instance.
Number of shards to use for the index.
Shards are useful to distribute the load on your cluster.
(see index settings)
Number of replicas to use for the index.
More replica means more resiliency but also more disk space and memory.
(see index settings)
The full qualified reference to the preprocessor
to use before taxonomy entry import.
This class must inherit app.indexing.BaseTaxonomyPreprocessor
and specialize the preprocess
method.
This is used to adapt the taxonomy schema
or to add specific fields for example.
app.openfoodfacts.TaxonomyPreprocessor
A list of all supported languages, it is used to build index mapping
No Additional Items['en', 'fr', 'it']
The full qualified reference to the document fetcher,
i.e. the class responsible from fetching the document.
using the document ID present in the Redis Stream.
It should inherit app._import.BaseDocumentFetcher
and specialize the fetch_document
method.
To keep things sleek,
you generally have few item fields in the event stream payload.
This class will fetch the full document using your application API.
app.openfoodfacts.DocumentFetcher
The full qualified reference to the preprocessor
to use before data import.
This class must inherit app.indexing.BaseDocumentPreprocessor
and specialize the preprocess
method.
This is used to adapt the data schema
or to add search-a-licious specific fields
for example.
app.openfoodfacts.DocumentPreprocessor
The full qualified reference to the elasticsearch result processor
to use after search query to Elasticsearch.
) This class must inherit app.postprocessing.BaseResultProcessor
and specialize the process_after
This is can be used to add custom fields computed from index content.
app.openfoodfacts.ResultProcessor
You can add scripts that can be used for sorting results.
Each key is a script name, with it's configuration.
Each additional property must conform to the following schema
Type: objectScripts can be used to sort results of a search.
This use ElasticSearch internal capabilities
The script language, as supported by Elasticsearch
The source of the script
Params for the scripts. We need this to retrieve and validate parameters
Additional params for the scripts that can't be supplied by the API (constants)
How much we boost exact matches on consecutive words
That is, if you search "Dark Chocolate",
it will boost entries that have the "Dark Chocolate" phrase (in the same field).
It only applies to free text search.
This only makes sense when using
"boost_phrase" request parameters and "best match" order.
Note: this field accept float of string,
because using float might generate rounding problems.
The string must represent a float.
How much we allow proximity for match_phrase_boost
.
If unspecified we will just match word to word.
Otherwise it will allow some gap between words matching
This only makes sense when using
"boost_phrase" request parameters and "best match" order.
list of documents IDs to ignore.
Use this to skip some documents at indexing time.
All items must be unique
No Additional ItemsName of the Redis stream to read from when listening to document updates.
If not provided, document updates won't be listened to for this index.
the default index to use when no index is specified in the query