Configuration¶
Config Module¶
- pydantic model app.config.BaseESIndexConfig[source]¶
Base class for configuring ElasticSearch indexes
Show JSON schema
{ "title": "BaseESIndexConfig", "description": "Base class for configuring ElasticSearch indexes", "type": "object", "properties": { "name": { "description": "Name of the index alias to use.\n\nSearch-a-licious will create an index using this name and an import date,\nbut alias will always point to the latest index.\n\nThe alias must not already exists in your ElasticSearch instance.", "title": "Name", "type": "string" }, "number_of_shards": { "default": 4, "description": "Number of shards to use for the index.\n\nShards are useful to distribute the load on your cluster.\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Shards", "type": "integer" }, "number_of_replicas": { "default": 1, "description": "Number of replicas to use for the index.\n\nMore replica means more resiliency but also more disk space and memory.\n\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Replicas", "type": "integer" } }, "required": [ "name" ] }
- field name: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='Name of the index alias to use.\n\nSearch-a-licious will create an index using this name and an import date,\nbut alias will always point to the latest index.\n\nThe alias must not already exists in your ElasticSearch instance.')] [Required][source]¶
Name of the index alias to use.
Search-a-licious will create an index using this name and an import date, but alias will always point to the latest index.
The alias must not already exists in your ElasticSearch instance.
- field number_of_replicas: Annotated[int, FieldInfo(annotation=NoneType, required=True, description='Number of replicas to use for the index.\n\nMore replica means more resiliency but also more disk space and memory.\n\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))')] = 1[source]¶
Number of replicas to use for the index.
More replica means more resiliency but also more disk space and memory.
(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))
- field number_of_shards: Annotated[int, FieldInfo(annotation=NoneType, required=True, description='Number of shards to use for the index.\n\nShards are useful to distribute the load on your cluster.\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))')] = 4[source]¶
Number of shards to use for the index.
Shards are useful to distribute the load on your cluster. (see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))
- pydantic model app.config.Config[source]¶
Search-a-licious server configuration.
The configuration is loaded from a YAML file, that must satisfy this schema.
Validations will be performed while we load it.
Show JSON schema
{ "title": "Config", "description": "Search-a-licious server configuration.\n\nThe configuration is loaded from a YAML file,\nthat must satisfy this schema.\n\nValidations will be performed while we load it.", "type": "object", "properties": { "indices": { "additionalProperties": { "$ref": "#/$defs/IndexConfig" }, "description": "configuration of indices.\n\n\nA Search-a-licious instance only have one configuration file,\nbut is capable of serving multiple datasets\n\nIt provides a section for each index you want to create (corresponding to a dataset).\n\nThe key is the ID of the index that can be referenced at query time.\nOne index corresponds to a specific set of documents and can be queried independently.\n\nIf you have multiple indexes, one of those index must be designed as the default one,\nsee `default_index`.\n", "title": "Indices", "type": "object" }, "default_index": { "description": "the default index to use when no index is specified in the query", "title": "Default Index", "type": "string" } }, "$defs": { "ESIndexConfig": { "description": "This is the configuration for the main index containing the data.\n\nIt's used to create the index in ElasticSearch, and configure its mappings\n(along with the *fields* config)", "properties": { "name": { "description": "Name of the index alias to use.\n\nSearch-a-licious will create an index using this name and an import date,\nbut alias will always point to the latest index.\n\nThe alias must not already exists in your ElasticSearch instance.", "title": "Name", "type": "string" }, "number_of_shards": { "default": 4, "description": "Number of shards to use for the index.\n\nShards are useful to distribute the load on your cluster.\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Shards", "type": "integer" }, "number_of_replicas": { "default": 1, "description": "Number of replicas to use for the index.\n\nMore replica means more resiliency but also more disk space and memory.\n\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Replicas", "type": "integer" }, "id_field_name": { "description": "Name of the field to use for `_id`.\nit is mandatory to provide one.\n\nIf your dataset does not have an identifier field,\nyou should use a document preprocessor to compute one (see `preprocessor`).", "title": "Id Field Name", "type": "string" }, "last_modified_field_name": { "description": "Name of the field containing the date of last modification,\nin your indexed objects.\n\nThis is used for incremental updates using Redis queues.\n\nThe field value must be an int/float representing the timestamp.", "title": "Last Modified Field Name", "type": "string" } }, "required": [ "name", "id_field_name", "last_modified_field_name" ], "title": "ESIndexConfig", "type": "object" }, "FieldConfig": { "properties": { "name": { "default": "", "description": "name of the field (must be unique", "title": "Name", "type": "string" }, "type": { "allOf": [ { "$ref": "#/$defs/FieldType" } ], "description": "Type of the field\n\nSupported field types in Search-a-Licious are:\n\n * keyword: string values that won't be interpreted (tokenized).\n Good for things like tags, serial, property values, etc.\n * date: Date fields\n * double, float, half_float, scaled_float:\n different ways of storing floats with different capacity\n * short, integer, long, unsigned_long :\n integers (with different capacity: 8 / 16 / 32 bits)\n * bool: boolean (true / false) values\n * text: a text which is tokenized to enable full text search\n * text_lang: like text, but with different values in different languages.\n Tokenization will use analyzers specific to each languages.\n * taxonomy: a field akin to keyword but\n with support for matching using taxonomy synonyms and translations\n (and in fact also a text mapping possibility)\n * disabled: a field that is not stored nor searchable\n (see [Elasticsearch help])\n * object: this field contains a dict with sub-fields.\n \n\n[Elasticsearch help]: https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html" }, "required": { "default": false, "description": "if required=True, the field is required in the input data\n\nAn entry that does not contains a value for this field will be rejected.", "title": "Required", "type": "boolean" }, "input_field": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "name of the input field to use when importing data\n\nBy default, Search-a-licious use the same name as the field name.\n\nThis is useful to index the same field using different types or configurations.", "title": "Input Field" }, "split": { "default": false, "description": "do we split the input field with `split_separator` ?\n\nThis is useful if you have some text fields that contains list of values,\n(for example a comma separated list of values, like apple,banana,carrot).\n\nYou must set split_separator to the character that separates the values in the dataset.", "title": "Split", "type": "boolean" }, "full_text_search": { "default": false, "description": "Wether this field in included on default full text search.\n\nIf `false`, the field is only used during search\nwhen filters involving this field are provided\n(as opposed to full text search expressions without any explicit field).", "title": "Full Text Search", "type": "boolean" }, "bucket_agg": { "default": false, "description": "do we add an bucket aggregation to the elasticsearch query for this field.\n\nIt is used to return a 'faceted-view' with the number of results for each facet value,\nor to generate bar charts.\n\nOnly valid for keyword, taxonomy or numeric field types.", "title": "Bucket Agg", "type": "boolean" }, "taxonomy_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "the name of the taxonomy associated with this field.\n\nIt must only be provided for taxonomy field type.", "title": "Taxonomy Name" } }, "required": [ "type" ], "title": "FieldConfig", "type": "object" }, "FieldType": { "description": "Supported field types in Search-a-Licious are:\n\n * keyword: string values that won't be interpreted (tokenized).\n Good for things like tags, serial, property values, etc.\n * date: Date fields\n * double, float, half_float, scaled_float:\n different ways of storing floats with different capacity\n * short, integer, long, unsigned_long :\n integers (with different capacity: 8 / 16 / 32 bits)\n * bool: boolean (true / false) values\n * text: a text which is tokenized to enable full text search\n * text_lang: like text, but with different values in different languages.\n Tokenization will use analyzers specific to each languages.\n * taxonomy: a field akin to keyword but\n with support for matching using taxonomy synonyms and translations\n (and in fact also a text mapping possibility)\n * disabled: a field that is not stored nor searchable\n (see [Elasticsearch help])\n * object: this field contains a dict with sub-fields.\n \n\n[Elasticsearch help]: https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html", "enum": [ "keyword", "date", "half_float", "scaled_float", "float", "double", "integer", "short", "long", "unsigned_long", "bool", "text", "text_lang", "taxonomy", "disabled", "object" ], "title": "FieldType", "type": "string" }, "IndexConfig": { "description": "This object gives configuration for one index.\n\nOne index usually correspond to one dataset.", "properties": { "index": { "allOf": [ { "$ref": "#/$defs/ESIndexConfig" } ], "description": "This is the configuration for the main index containing the data.\n\n It's used to create the index in ElasticSearch, and configure its mappings\n (along with the *fields* config)\n " }, "fields": { "additionalProperties": { "$ref": "#/$defs/FieldConfig" }, "description": "Configuration of all fields we need to store in the index.\n\nKeys are field names,\nvalues contain the field configuration.\n\nThis is a very important part of the configuration.\n\nMost of the ElasticSearch mapping will depends on it.\nElasticSearch will also use this configuration\nto provide intended behaviour.\n\n(see also [Explain Configuration](./explain_configuration.md#fields))\n\nIf you change those settings you will have to re-index all the data.\n(But you can do so in the background).", "title": "Fields", "type": "object" }, "split_separator": { "default": ",", "description": "separator to use when splitting values, for fields that have split=True", "title": "Split Separator", "type": "string" }, "lang_separator": { "default": "_", "description": "for `text_lang` FieldType, the separator between the name of the field and the language code, ex: product_name_it if lang_separator=\"_\"", "title": "Lang Separator", "type": "string" }, "primary_color": { "default": "#aaa", "description": "Used for vega charts. Use CSS color code.", "title": "Primary Color", "type": "string" }, "accent_color": { "default": "#222", "description": "Used for vega. Should be CSS color code.", "title": "Accent Color", "type": "string" }, "taxonomy": { "allOf": [ { "$ref": "#/$defs/TaxonomyConfig" } ], "description": "Configuration of taxonomies,\n that is collections of entries with synonyms in multiple languages.\n\n See [Explain taxonomies](../explain-taxonomies)\n\n Field may be linked to taxonomies.\n\n It enables enriching search with synonyms,\n as well as providing suggestions,\n or informative facets.\n\n Note: if you define taxonomies, you must import them using\n [import-taxonomies command](../ref-python/cli.html#python3-m-app-import-taxonomies)\n " }, "supported_langs": { "description": "A list of all supported languages, it is used to build index mapping", "examples": [ [ "en", "fr", "it" ] ], "items": { "type": "string" }, "title": "Supported Langs", "type": "array" }, "document_fetcher": { "description": "The full qualified reference to the document fetcher,\ni.e. the class responsible from fetching the document.\nusing the document ID present in the Redis Stream.\n\nIt should inherit `app._import.BaseDocumentFetcher`\nand specialize the `fetch_document` method.\n\nTo keep things sleek,\nyou generally have few item fields in the event stream payload.\nThis class will fetch the full document using your application API.", "examples": [ "app.openfoodfacts.DocumentFetcher" ], "title": "Document Fetcher", "type": "string" }, "preprocessor": { "anyOf": [ { "description": "The full qualified reference to the preprocessor\nto use before data import.\n\nThis class must inherit `app.indexing.BaseDocumentPreprocessor`\nand specialize the `preprocess` method.\n\nThis is used to adapt the data schema\nor to add search-a-licious specific fields\nfor example.", "examples": [ "app.openfoodfacts.DocumentPreprocessor" ], "type": "string" }, { "type": "null" } ], "default": null, "title": "Preprocessor" }, "result_processor": { "anyOf": [ { "description": "The full qualified reference to the elasticsearch result processor\n to use after search query to Elasticsearch.\n\n) This class must inherit `app.postprocessing.BaseResultProcessor`\n and specialize the `process_after`\n\n This is can be used to add custom fields computed from index content.\n ", "examples": [ "app.openfoodfacts.ResultProcessor" ], "type": "string" }, { "type": "null" } ], "default": null, "title": "Result Processor" }, "scripts": { "anyOf": [ { "additionalProperties": { "$ref": "#/$defs/ScriptConfig" }, "description": "You can add scripts that can be used for sorting results.\n\nEach key is a script name, with it's configuration.", "type": "object" }, { "type": "null" } ], "default": null, "title": "Scripts" }, "match_phrase_boost": { "default": 2.0, "description": "How much we boost exact matches on consecutive words\n\nThat is, if you search \"Dark Chocolate\",\nit will boost entries that have the \"Dark Chocolate\" phrase (in the same field).\n\nIt only applies to free text search.\n\nThis only makes sense when using\n\"boost_phrase\" request parameters and \"best match\" order.\n\nNote: this field accept float of string,\nbecause using float might generate rounding problems.\nThe string must represent a float.", "title": "Match Phrase Boost", "type": "number" }, "match_phrase_boost_proximity": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "How much we allow proximity for `match_phrase_boost`.\n\nIf unspecified we will just match word to word.\nOtherwise it will allow some gap between words matching\n\nThis only makes sense when using\n\"boost_phrase\" request parameters and \"best match\" order.", "title": "Match Phrase Boost Proximity" }, "document_denylist": { "description": "list of documents IDs to ignore.\n\nUse this to skip some documents at indexing time.", "items": { "type": "string" }, "title": "Document Denylist", "type": "array", "uniqueItems": true }, "redis_stream_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Name of the Redis stream to read from when listening to document updates.\n\nIf not provided, document updates won't be listened to for this index.", "title": "Redis Stream Name" } }, "required": [ "index", "fields", "taxonomy", "supported_langs", "document_fetcher" ], "title": "IndexConfig", "type": "object" }, "ScriptConfig": { "description": "Scripts can be used to sort results of a search.\n\nThis use ElasticSearch internal capabilities", "properties": { "lang": { "allOf": [ { "$ref": "#/$defs/ScriptType" } ], "default": "expression", "description": "The script language, as supported by Elasticsearch" }, "source": { "description": "The source of the script", "title": "Source", "type": "string" }, "params": { "anyOf": [ { "description": "Params for the scripts. We need this to retrieve and validate parameters", "type": "object" }, { "type": "null" } ], "title": "Params" }, "static_params": { "anyOf": [ { "description": "Additional params for the scripts that can't be supplied by the API (constants)", "type": "object" }, { "type": "null" } ], "title": "Static Params" } }, "required": [ "source", "params", "static_params" ], "title": "ScriptConfig", "type": "object" }, "ScriptType": { "enum": [ "expression", "painless" ], "title": "ScriptType", "type": "string" }, "TaxonomyConfig": { "description": "Configuration of taxonomies,\nthat is collections of entries with synonyms in multiple languages.\n\nSee [Explain taxonomies](../explain-taxonomies)\n\nField may be linked to taxonomies.\n\nIt enables enriching search with synonyms,\nas well as providing suggestions,\nor informative facets.\n\nNote: if you define taxonomies, you must import them using\n[import-taxonomies command](../ref-python/cli.html#python3-m-app-import-taxonomies)", "properties": { "sources": { "description": "Configurations of taxonomies that this project will use.", "items": { "$ref": "#/$defs/TaxonomySourceConfig" }, "title": "Sources", "type": "array" }, "index": { "allOf": [ { "$ref": "#/$defs/TaxonomyIndexConfig" } ], "description": "This is the configuration of\n the ElasticSearch index storing the taxonomies.\n\n All taxonomies are stored within the same index.\n\n It enables functions like auto-completion, or field suggestions\n as well as enrichment of requests with synonyms.\n " }, "preprocessor": { "anyOf": [ { "description": "The full qualified reference to the preprocessor\nto use before taxonomy entry import.\n\nThis class must inherit `app.indexing.BaseTaxonomyPreprocessor`\nand specialize the `preprocess` method.\n\nThis is used to adapt the taxonomy schema\nor to add specific fields for example.", "examples": [ "app.openfoodfacts.TaxonomyPreprocessor" ], "type": "string" }, { "type": "null" } ], "default": null, "title": "Preprocessor" } }, "required": [ "sources", "index" ], "title": "TaxonomyConfig", "type": "object" }, "TaxonomyIndexConfig": { "description": "This is the configuration of\nthe ElasticSearch index storing the taxonomies.\n\nAll taxonomies are stored within the same index.\n\nIt enables functions like auto-completion, or field suggestions\nas well as enrichment of requests with synonyms.", "properties": { "name": { "description": "Name of the index alias to use.\n\nSearch-a-licious will create an index using this name and an import date,\nbut alias will always point to the latest index.\n\nThe alias must not already exists in your ElasticSearch instance.", "title": "Name", "type": "string" }, "number_of_shards": { "default": 4, "description": "Number of shards to use for the index.\n\nShards are useful to distribute the load on your cluster.\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Shards", "type": "integer" }, "number_of_replicas": { "default": 1, "description": "Number of replicas to use for the index.\n\nMore replica means more resiliency but also more disk space and memory.\n\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Replicas", "type": "integer" } }, "required": [ "name" ], "title": "TaxonomyIndexConfig", "type": "object" }, "TaxonomySourceConfig": { "description": "Configuration on how to fetch a particular taxonomy.", "properties": { "name": { "description": "Name of the taxonomy\n\nThis is the name you will use in the configuration (and the API)\nto reference this taxonomy", "title": "Name", "type": "string" }, "url": { "anyOf": [ { "format": "uri", "minLength": 1, "type": "string" }, { "format": "uri", "maxLength": 2083, "minLength": 1, "type": "string" } ], "description": "URL of the taxonomy.\n\nThe target file must be in JSON format\nand follows Open Food Facts JSON taxonomy format.\n\nThis is a dict where each key correspond to a taxonomy entry id,\nvalues are dict with following properties:\n\n* name: contains a dict giving the name (string) for this entry\n in various languages (keys are language codes)\n* synonyms: contains a dict giving a list of synonyms by language code\n* parents: contains a list of direct parent ids (taxonomy is a directed acyclic graph)\n\nOther keys correspond to properties associated to this entry (eg. wikidata id).", "title": "Url" } }, "required": [ "name", "url" ], "title": "TaxonomySourceConfig", "type": "object" } }, "required": [ "indices", "default_index" ] }
- Fields:
- Validators:
defaut_index_must_exist
»all fields
redis_stream_name_should_be_unique
»all fields
- field default_index: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='the default index to use when no index is specified in the query')] [Required][source]¶
the default index to use when no index is specified in the query
- Validated by:
- field indices: dict[str, IndexConfig] [Required][source]¶
configuration of indices.
A Search-a-licious instance only have one configuration file, but is capable of serving multiple datasets
It provides a section for each index you want to create (corresponding to a dataset).
The key is the ID of the index that can be referenced at query time. One index corresponds to a specific set of documents and can be queried independently.
If you have multiple indexes, one of those index must be designed as the default one, see default_index.
- Validated by:
- validator defaut_index_must_exist » all fields[source]¶
Validator that checks that default_index exists.
- get_index_config(index_id: str | None) tuple[str, IndexConfig] [source]¶
Return a (index_id, IndexConfig) for the given index_id.
If no index_id is provided, the default index is used. If the index_id is not found, (index_id, None) is returned.
- class app.config.ConfigGenerateJsonSchema(by_alias: bool = True, ref_template: str = '#/$defs/{model}')[source]¶
Config to add fields to generated JSON schema for Config.
- generate(schema, mode='validation')[source]¶
Generates a JSON schema for a specified schema in a specified mode.
- Args:
schema: A Pydantic model. mode: The mode in which to generate the schema. Defaults to ‘validation’.
- Returns:
A JSON schema representing the specified schema.
- Raises:
PydanticUserError: If the JSON schema generator has already been used to generate a JSON schema.
- pydantic model app.config.ESIndexConfig[source]¶
This is the configuration for the main index containing the data.
It’s used to create the index in ElasticSearch, and configure its mappings (along with the fields config)
Show JSON schema
{ "title": "ESIndexConfig", "description": "This is the configuration for the main index containing the data.\n\nIt's used to create the index in ElasticSearch, and configure its mappings\n(along with the *fields* config)", "type": "object", "properties": { "name": { "description": "Name of the index alias to use.\n\nSearch-a-licious will create an index using this name and an import date,\nbut alias will always point to the latest index.\n\nThe alias must not already exists in your ElasticSearch instance.", "title": "Name", "type": "string" }, "number_of_shards": { "default": 4, "description": "Number of shards to use for the index.\n\nShards are useful to distribute the load on your cluster.\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Shards", "type": "integer" }, "number_of_replicas": { "default": 1, "description": "Number of replicas to use for the index.\n\nMore replica means more resiliency but also more disk space and memory.\n\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Replicas", "type": "integer" }, "id_field_name": { "description": "Name of the field to use for `_id`.\nit is mandatory to provide one.\n\nIf your dataset does not have an identifier field,\nyou should use a document preprocessor to compute one (see `preprocessor`).", "title": "Id Field Name", "type": "string" }, "last_modified_field_name": { "description": "Name of the field containing the date of last modification,\nin your indexed objects.\n\nThis is used for incremental updates using Redis queues.\n\nThe field value must be an int/float representing the timestamp.", "title": "Last Modified Field Name", "type": "string" } }, "required": [ "name", "id_field_name", "last_modified_field_name" ] }
- Fields:
id_field_name (Annotated[str, FieldInfo(annotation=NoneType, required=True, description='Name of the field to use for `_id
.nit is mandatory to provide one.nnIf your dataset does not have an identifier field,nyou should use a document preprocessor to compute one (see preprocessor).’)]) <app.config.ESIndexConfig.id_field_name>`
- field id_field_name: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='Name of the field to use for `_id`.\nit is mandatory to provide one.\n\nIf your dataset does not have an identifier field,\nyou should use a document preprocessor to compute one (see `preprocessor`).')] [Required][source]¶
Name of the field to use for _id. it is mandatory to provide one.
If your dataset does not have an identifier field, you should use a document preprocessor to compute one (see preprocessor).
- field last_modified_field_name: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='Name of the field containing the date of last modification,\nin your indexed objects.\n\nThis is used for incremental updates using Redis queues.\n\nThe field value must be an int/float representing the timestamp.')] [Required][source]¶
Name of the field containing the date of last modification, in your indexed objects.
This is used for incremental updates using Redis queues.
The field value must be an int/float representing the timestamp.
- pydantic model app.config.FieldConfig[source]¶
Show JSON schema
{ "title": "FieldConfig", "type": "object", "properties": { "name": { "default": "", "description": "name of the field (must be unique", "title": "Name", "type": "string" }, "type": { "allOf": [ { "$ref": "#/$defs/FieldType" } ], "description": "Type of the field\n\nSupported field types in Search-a-Licious are:\n\n * keyword: string values that won't be interpreted (tokenized).\n Good for things like tags, serial, property values, etc.\n * date: Date fields\n * double, float, half_float, scaled_float:\n different ways of storing floats with different capacity\n * short, integer, long, unsigned_long :\n integers (with different capacity: 8 / 16 / 32 bits)\n * bool: boolean (true / false) values\n * text: a text which is tokenized to enable full text search\n * text_lang: like text, but with different values in different languages.\n Tokenization will use analyzers specific to each languages.\n * taxonomy: a field akin to keyword but\n with support for matching using taxonomy synonyms and translations\n (and in fact also a text mapping possibility)\n * disabled: a field that is not stored nor searchable\n (see [Elasticsearch help])\n * object: this field contains a dict with sub-fields.\n \n\n[Elasticsearch help]: https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html" }, "required": { "default": false, "description": "if required=True, the field is required in the input data\n\nAn entry that does not contains a value for this field will be rejected.", "title": "Required", "type": "boolean" }, "input_field": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "name of the input field to use when importing data\n\nBy default, Search-a-licious use the same name as the field name.\n\nThis is useful to index the same field using different types or configurations.", "title": "Input Field" }, "split": { "default": false, "description": "do we split the input field with `split_separator` ?\n\nThis is useful if you have some text fields that contains list of values,\n(for example a comma separated list of values, like apple,banana,carrot).\n\nYou must set split_separator to the character that separates the values in the dataset.", "title": "Split", "type": "boolean" }, "full_text_search": { "default": false, "description": "Wether this field in included on default full text search.\n\nIf `false`, the field is only used during search\nwhen filters involving this field are provided\n(as opposed to full text search expressions without any explicit field).", "title": "Full Text Search", "type": "boolean" }, "bucket_agg": { "default": false, "description": "do we add an bucket aggregation to the elasticsearch query for this field.\n\nIt is used to return a 'faceted-view' with the number of results for each facet value,\nor to generate bar charts.\n\nOnly valid for keyword, taxonomy or numeric field types.", "title": "Bucket Agg", "type": "boolean" }, "taxonomy_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "the name of the taxonomy associated with this field.\n\nIt must only be provided for taxonomy field type.", "title": "Taxonomy Name" } }, "$defs": { "FieldType": { "description": "Supported field types in Search-a-Licious are:\n\n * keyword: string values that won't be interpreted (tokenized).\n Good for things like tags, serial, property values, etc.\n * date: Date fields\n * double, float, half_float, scaled_float:\n different ways of storing floats with different capacity\n * short, integer, long, unsigned_long :\n integers (with different capacity: 8 / 16 / 32 bits)\n * bool: boolean (true / false) values\n * text: a text which is tokenized to enable full text search\n * text_lang: like text, but with different values in different languages.\n Tokenization will use analyzers specific to each languages.\n * taxonomy: a field akin to keyword but\n with support for matching using taxonomy synonyms and translations\n (and in fact also a text mapping possibility)\n * disabled: a field that is not stored nor searchable\n (see [Elasticsearch help])\n * object: this field contains a dict with sub-fields.\n \n\n[Elasticsearch help]: https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html", "enum": [ "keyword", "date", "half_float", "scaled_float", "float", "double", "integer", "short", "long", "unsigned_long", "bool", "text", "text_lang", "taxonomy", "disabled", "object" ], "title": "FieldType", "type": "string" } }, "required": [ "type" ] }
- Fields:
- Validators:
- field bucket_agg: Annotated[bool, FieldInfo(annotation=NoneType, required=True, description="do we add an bucket aggregation to the elasticsearch query for this field.\n\nIt is used to return a 'faceted-view' with the number of results for each facet value,\nor to generate bar charts.\n\nOnly valid for keyword, taxonomy or numeric field types.")] = False[source]¶
do we add an bucket aggregation to the elasticsearch query for this field.
It is used to return a ‘faceted-view’ with the number of results for each facet value, or to generate bar charts.
Only valid for keyword, taxonomy or numeric field types.
- field full_text_search: Annotated[bool, FieldInfo(annotation=NoneType, required=True, description='Wether this field in included on default full text search.\n\nIf `false`, the field is only used during search\nwhen filters involving this field are provided\n(as opposed to full text search expressions without any explicit field).')] = False[source]¶
Wether this field in included on default full text search.
If false, the field is only used during search when filters involving this field are provided (as opposed to full text search expressions without any explicit field).
- field input_field: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, description='name of the input field to use when importing data\n\nBy default, Search-a-licious use the same name as the field name.\n\nThis is useful to index the same field using different types or configurations.')] = None[source]¶
name of the input field to use when importing data
By default, Search-a-licious use the same name as the field name.
This is useful to index the same field using different types or configurations.
- field name: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='name of the field (must be unique')] = ''[source]¶
name of the field (must be unique
- field required: Annotated[bool, FieldInfo(annotation=NoneType, required=True, description='if required=True, the field is required in the input data\n\nAn entry that does not contains a value for this field will be rejected.')] = False[source]¶
if required=True, the field is required in the input data
An entry that does not contains a value for this field will be rejected.
- field split: Annotated[bool, FieldInfo(annotation=NoneType, required=True, description='do we split the input field with `split_separator` ?\n\nThis is useful if you have some text fields that contains list of values,\n(for example a comma separated list of values, like apple,banana,carrot).\n\nYou must set split_separator to the character that separates the values in the dataset.')] = False[source]¶
do we split the input field with split_separator ?
This is useful if you have some text fields that contains list of values, (for example a comma separated list of values, like apple,banana,carrot).
You must set split_separator to the character that separates the values in the dataset.
- field taxonomy_name: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, description='the name of the taxonomy associated with this field.\n\nIt must only be provided for taxonomy field type.')] = None[source]¶
the name of the taxonomy associated with this field.
It must only be provided for taxonomy field type.
- field type: Annotated[FieldType, FieldInfo(annotation=NoneType, required=True, description="Type of the field\n\nSupported field types in Search-a-Licious are:\n\n * keyword: string values that won't be interpreted (tokenized).\n Good for things like tags, serial, property values, etc.\n * date: Date fields\n * double, float, half_float, scaled_float:\n different ways of storing floats with different capacity\n * short, integer, long, unsigned_long :\n integers (with different capacity: 8 / 16 / 32 bits)\n * bool: boolean (true / false) values\n * text: a text which is tokenized to enable full text search\n * text_lang: like text, but with different values in different languages.\n Tokenization will use analyzers specific to each languages.\n * taxonomy: a field akin to keyword but\n with support for matching using taxonomy synonyms and translations\n (and in fact also a text mapping possibility)\n * disabled: a field that is not stored nor searchable\n (see [Elasticsearch help])\n * object: this field contains a dict with sub-fields.\n \n\n[Elasticsearch help]: https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html")] [Required][source]¶
Type of the field
Supported field types in Search-a-Licious are:
keyword: string values that won’t be interpreted (tokenized). Good for things like tags, serial, property values, etc.
date: Date fields
double, float, half_float, scaled_float: different ways of storing floats with different capacity
short, integer, long, unsigned_long : integers (with different capacity: 8 / 16 / 32 bits)
bool: boolean (true / false) values
text: a text which is tokenized to enable full text search
text_lang: like text, but with different values in different languages. Tokenization will use analyzers specific to each languages.
taxonomy: a field akin to keyword but with support for matching using taxonomy synonyms and translations (and in fact also a text mapping possibility)
disabled: a field that is not stored nor searchable (see [Elasticsearch help])
object: this field contains a dict with sub-fields.
[Elasticsearch help]: https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html
- class app.config.FieldType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Supported field types in Search-a-Licious are:
keyword: string values that won’t be interpreted (tokenized). Good for things like tags, serial, property values, etc.
date: Date fields
double, float, half_float, scaled_float: different ways of storing floats with different capacity
short, integer, long, unsigned_long : integers (with different capacity: 8 / 16 / 32 bits)
bool: boolean (true / false) values
text: a text which is tokenized to enable full text search
text_lang: like text, but with different values in different languages. Tokenization will use analyzers specific to each languages.
taxonomy: a field akin to keyword but with support for matching using taxonomy synonyms and translations (and in fact also a text mapping possibility)
disabled: a field that is not stored nor searchable (see [Elasticsearch help])
object: this field contains a dict with sub-fields.
[Elasticsearch help]: https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html
- pydantic model app.config.IndexConfig[source]¶
This object gives configuration for one index.
One index usually correspond to one dataset.
Show JSON schema
{ "title": "IndexConfig", "description": "This object gives configuration for one index.\n\nOne index usually correspond to one dataset.", "type": "object", "properties": { "index": { "allOf": [ { "$ref": "#/$defs/ESIndexConfig" } ], "description": "This is the configuration for the main index containing the data.\n\n It's used to create the index in ElasticSearch, and configure its mappings\n (along with the *fields* config)\n " }, "fields": { "additionalProperties": { "$ref": "#/$defs/FieldConfig" }, "description": "Configuration of all fields we need to store in the index.\n\nKeys are field names,\nvalues contain the field configuration.\n\nThis is a very important part of the configuration.\n\nMost of the ElasticSearch mapping will depends on it.\nElasticSearch will also use this configuration\nto provide intended behaviour.\n\n(see also [Explain Configuration](./explain_configuration.md#fields))\n\nIf you change those settings you will have to re-index all the data.\n(But you can do so in the background).", "title": "Fields", "type": "object" }, "split_separator": { "default": ",", "description": "separator to use when splitting values, for fields that have split=True", "title": "Split Separator", "type": "string" }, "lang_separator": { "default": "_", "description": "for `text_lang` FieldType, the separator between the name of the field and the language code, ex: product_name_it if lang_separator=\"_\"", "title": "Lang Separator", "type": "string" }, "primary_color": { "default": "#aaa", "description": "Used for vega charts. Use CSS color code.", "title": "Primary Color", "type": "string" }, "accent_color": { "default": "#222", "description": "Used for vega. Should be CSS color code.", "title": "Accent Color", "type": "string" }, "taxonomy": { "allOf": [ { "$ref": "#/$defs/TaxonomyConfig" } ], "description": "Configuration of taxonomies,\n that is collections of entries with synonyms in multiple languages.\n\n See [Explain taxonomies](../explain-taxonomies)\n\n Field may be linked to taxonomies.\n\n It enables enriching search with synonyms,\n as well as providing suggestions,\n or informative facets.\n\n Note: if you define taxonomies, you must import them using\n [import-taxonomies command](../ref-python/cli.html#python3-m-app-import-taxonomies)\n " }, "supported_langs": { "description": "A list of all supported languages, it is used to build index mapping", "examples": [ [ "en", "fr", "it" ] ], "items": { "type": "string" }, "title": "Supported Langs", "type": "array" }, "document_fetcher": { "description": "The full qualified reference to the document fetcher,\ni.e. the class responsible from fetching the document.\nusing the document ID present in the Redis Stream.\n\nIt should inherit `app._import.BaseDocumentFetcher`\nand specialize the `fetch_document` method.\n\nTo keep things sleek,\nyou generally have few item fields in the event stream payload.\nThis class will fetch the full document using your application API.", "examples": [ "app.openfoodfacts.DocumentFetcher" ], "title": "Document Fetcher", "type": "string" }, "preprocessor": { "anyOf": [ { "description": "The full qualified reference to the preprocessor\nto use before data import.\n\nThis class must inherit `app.indexing.BaseDocumentPreprocessor`\nand specialize the `preprocess` method.\n\nThis is used to adapt the data schema\nor to add search-a-licious specific fields\nfor example.", "examples": [ "app.openfoodfacts.DocumentPreprocessor" ], "type": "string" }, { "type": "null" } ], "default": null, "title": "Preprocessor" }, "result_processor": { "anyOf": [ { "description": "The full qualified reference to the elasticsearch result processor\n to use after search query to Elasticsearch.\n\n) This class must inherit `app.postprocessing.BaseResultProcessor`\n and specialize the `process_after`\n\n This is can be used to add custom fields computed from index content.\n ", "examples": [ "app.openfoodfacts.ResultProcessor" ], "type": "string" }, { "type": "null" } ], "default": null, "title": "Result Processor" }, "scripts": { "anyOf": [ { "additionalProperties": { "$ref": "#/$defs/ScriptConfig" }, "description": "You can add scripts that can be used for sorting results.\n\nEach key is a script name, with it's configuration.", "type": "object" }, { "type": "null" } ], "default": null, "title": "Scripts" }, "match_phrase_boost": { "default": 2.0, "description": "How much we boost exact matches on consecutive words\n\nThat is, if you search \"Dark Chocolate\",\nit will boost entries that have the \"Dark Chocolate\" phrase (in the same field).\n\nIt only applies to free text search.\n\nThis only makes sense when using\n\"boost_phrase\" request parameters and \"best match\" order.\n\nNote: this field accept float of string,\nbecause using float might generate rounding problems.\nThe string must represent a float.", "title": "Match Phrase Boost", "type": "number" }, "match_phrase_boost_proximity": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "How much we allow proximity for `match_phrase_boost`.\n\nIf unspecified we will just match word to word.\nOtherwise it will allow some gap between words matching\n\nThis only makes sense when using\n\"boost_phrase\" request parameters and \"best match\" order.", "title": "Match Phrase Boost Proximity" }, "document_denylist": { "description": "list of documents IDs to ignore.\n\nUse this to skip some documents at indexing time.", "items": { "type": "string" }, "title": "Document Denylist", "type": "array", "uniqueItems": true }, "redis_stream_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Name of the Redis stream to read from when listening to document updates.\n\nIf not provided, document updates won't be listened to for this index.", "title": "Redis Stream Name" } }, "$defs": { "ESIndexConfig": { "description": "This is the configuration for the main index containing the data.\n\nIt's used to create the index in ElasticSearch, and configure its mappings\n(along with the *fields* config)", "properties": { "name": { "description": "Name of the index alias to use.\n\nSearch-a-licious will create an index using this name and an import date,\nbut alias will always point to the latest index.\n\nThe alias must not already exists in your ElasticSearch instance.", "title": "Name", "type": "string" }, "number_of_shards": { "default": 4, "description": "Number of shards to use for the index.\n\nShards are useful to distribute the load on your cluster.\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Shards", "type": "integer" }, "number_of_replicas": { "default": 1, "description": "Number of replicas to use for the index.\n\nMore replica means more resiliency but also more disk space and memory.\n\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Replicas", "type": "integer" }, "id_field_name": { "description": "Name of the field to use for `_id`.\nit is mandatory to provide one.\n\nIf your dataset does not have an identifier field,\nyou should use a document preprocessor to compute one (see `preprocessor`).", "title": "Id Field Name", "type": "string" }, "last_modified_field_name": { "description": "Name of the field containing the date of last modification,\nin your indexed objects.\n\nThis is used for incremental updates using Redis queues.\n\nThe field value must be an int/float representing the timestamp.", "title": "Last Modified Field Name", "type": "string" } }, "required": [ "name", "id_field_name", "last_modified_field_name" ], "title": "ESIndexConfig", "type": "object" }, "FieldConfig": { "properties": { "name": { "default": "", "description": "name of the field (must be unique", "title": "Name", "type": "string" }, "type": { "allOf": [ { "$ref": "#/$defs/FieldType" } ], "description": "Type of the field\n\nSupported field types in Search-a-Licious are:\n\n * keyword: string values that won't be interpreted (tokenized).\n Good for things like tags, serial, property values, etc.\n * date: Date fields\n * double, float, half_float, scaled_float:\n different ways of storing floats with different capacity\n * short, integer, long, unsigned_long :\n integers (with different capacity: 8 / 16 / 32 bits)\n * bool: boolean (true / false) values\n * text: a text which is tokenized to enable full text search\n * text_lang: like text, but with different values in different languages.\n Tokenization will use analyzers specific to each languages.\n * taxonomy: a field akin to keyword but\n with support for matching using taxonomy synonyms and translations\n (and in fact also a text mapping possibility)\n * disabled: a field that is not stored nor searchable\n (see [Elasticsearch help])\n * object: this field contains a dict with sub-fields.\n \n\n[Elasticsearch help]: https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html" }, "required": { "default": false, "description": "if required=True, the field is required in the input data\n\nAn entry that does not contains a value for this field will be rejected.", "title": "Required", "type": "boolean" }, "input_field": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "name of the input field to use when importing data\n\nBy default, Search-a-licious use the same name as the field name.\n\nThis is useful to index the same field using different types or configurations.", "title": "Input Field" }, "split": { "default": false, "description": "do we split the input field with `split_separator` ?\n\nThis is useful if you have some text fields that contains list of values,\n(for example a comma separated list of values, like apple,banana,carrot).\n\nYou must set split_separator to the character that separates the values in the dataset.", "title": "Split", "type": "boolean" }, "full_text_search": { "default": false, "description": "Wether this field in included on default full text search.\n\nIf `false`, the field is only used during search\nwhen filters involving this field are provided\n(as opposed to full text search expressions without any explicit field).", "title": "Full Text Search", "type": "boolean" }, "bucket_agg": { "default": false, "description": "do we add an bucket aggregation to the elasticsearch query for this field.\n\nIt is used to return a 'faceted-view' with the number of results for each facet value,\nor to generate bar charts.\n\nOnly valid for keyword, taxonomy or numeric field types.", "title": "Bucket Agg", "type": "boolean" }, "taxonomy_name": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "the name of the taxonomy associated with this field.\n\nIt must only be provided for taxonomy field type.", "title": "Taxonomy Name" } }, "required": [ "type" ], "title": "FieldConfig", "type": "object" }, "FieldType": { "description": "Supported field types in Search-a-Licious are:\n\n * keyword: string values that won't be interpreted (tokenized).\n Good for things like tags, serial, property values, etc.\n * date: Date fields\n * double, float, half_float, scaled_float:\n different ways of storing floats with different capacity\n * short, integer, long, unsigned_long :\n integers (with different capacity: 8 / 16 / 32 bits)\n * bool: boolean (true / false) values\n * text: a text which is tokenized to enable full text search\n * text_lang: like text, but with different values in different languages.\n Tokenization will use analyzers specific to each languages.\n * taxonomy: a field akin to keyword but\n with support for matching using taxonomy synonyms and translations\n (and in fact also a text mapping possibility)\n * disabled: a field that is not stored nor searchable\n (see [Elasticsearch help])\n * object: this field contains a dict with sub-fields.\n \n\n[Elasticsearch help]: https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html", "enum": [ "keyword", "date", "half_float", "scaled_float", "float", "double", "integer", "short", "long", "unsigned_long", "bool", "text", "text_lang", "taxonomy", "disabled", "object" ], "title": "FieldType", "type": "string" }, "ScriptConfig": { "description": "Scripts can be used to sort results of a search.\n\nThis use ElasticSearch internal capabilities", "properties": { "lang": { "allOf": [ { "$ref": "#/$defs/ScriptType" } ], "default": "expression", "description": "The script language, as supported by Elasticsearch" }, "source": { "description": "The source of the script", "title": "Source", "type": "string" }, "params": { "anyOf": [ { "description": "Params for the scripts. We need this to retrieve and validate parameters", "type": "object" }, { "type": "null" } ], "title": "Params" }, "static_params": { "anyOf": [ { "description": "Additional params for the scripts that can't be supplied by the API (constants)", "type": "object" }, { "type": "null" } ], "title": "Static Params" } }, "required": [ "source", "params", "static_params" ], "title": "ScriptConfig", "type": "object" }, "ScriptType": { "enum": [ "expression", "painless" ], "title": "ScriptType", "type": "string" }, "TaxonomyConfig": { "description": "Configuration of taxonomies,\nthat is collections of entries with synonyms in multiple languages.\n\nSee [Explain taxonomies](../explain-taxonomies)\n\nField may be linked to taxonomies.\n\nIt enables enriching search with synonyms,\nas well as providing suggestions,\nor informative facets.\n\nNote: if you define taxonomies, you must import them using\n[import-taxonomies command](../ref-python/cli.html#python3-m-app-import-taxonomies)", "properties": { "sources": { "description": "Configurations of taxonomies that this project will use.", "items": { "$ref": "#/$defs/TaxonomySourceConfig" }, "title": "Sources", "type": "array" }, "index": { "allOf": [ { "$ref": "#/$defs/TaxonomyIndexConfig" } ], "description": "This is the configuration of\n the ElasticSearch index storing the taxonomies.\n\n All taxonomies are stored within the same index.\n\n It enables functions like auto-completion, or field suggestions\n as well as enrichment of requests with synonyms.\n " }, "preprocessor": { "anyOf": [ { "description": "The full qualified reference to the preprocessor\nto use before taxonomy entry import.\n\nThis class must inherit `app.indexing.BaseTaxonomyPreprocessor`\nand specialize the `preprocess` method.\n\nThis is used to adapt the taxonomy schema\nor to add specific fields for example.", "examples": [ "app.openfoodfacts.TaxonomyPreprocessor" ], "type": "string" }, { "type": "null" } ], "default": null, "title": "Preprocessor" } }, "required": [ "sources", "index" ], "title": "TaxonomyConfig", "type": "object" }, "TaxonomyIndexConfig": { "description": "This is the configuration of\nthe ElasticSearch index storing the taxonomies.\n\nAll taxonomies are stored within the same index.\n\nIt enables functions like auto-completion, or field suggestions\nas well as enrichment of requests with synonyms.", "properties": { "name": { "description": "Name of the index alias to use.\n\nSearch-a-licious will create an index using this name and an import date,\nbut alias will always point to the latest index.\n\nThe alias must not already exists in your ElasticSearch instance.", "title": "Name", "type": "string" }, "number_of_shards": { "default": 4, "description": "Number of shards to use for the index.\n\nShards are useful to distribute the load on your cluster.\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Shards", "type": "integer" }, "number_of_replicas": { "default": 1, "description": "Number of replicas to use for the index.\n\nMore replica means more resiliency but also more disk space and memory.\n\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Replicas", "type": "integer" } }, "required": [ "name" ], "title": "TaxonomyIndexConfig", "type": "object" }, "TaxonomySourceConfig": { "description": "Configuration on how to fetch a particular taxonomy.", "properties": { "name": { "description": "Name of the taxonomy\n\nThis is the name you will use in the configuration (and the API)\nto reference this taxonomy", "title": "Name", "type": "string" }, "url": { "anyOf": [ { "format": "uri", "minLength": 1, "type": "string" }, { "format": "uri", "maxLength": 2083, "minLength": 1, "type": "string" } ], "description": "URL of the taxonomy.\n\nThe target file must be in JSON format\nand follows Open Food Facts JSON taxonomy format.\n\nThis is a dict where each key correspond to a taxonomy entry id,\nvalues are dict with following properties:\n\n* name: contains a dict giving the name (string) for this entry\n in various languages (keys are language codes)\n* synonyms: contains a dict giving a list of synonyms by language code\n* parents: contains a list of direct parent ids (taxonomy is a directed acyclic graph)\n\nOther keys correspond to properties associated to this entry (eg. wikidata id).", "title": "Url" } }, "required": [ "name", "url" ], "title": "TaxonomySourceConfig", "type": "object" } }, "required": [ "index", "fields", "taxonomy", "supported_langs", "document_fetcher" ] }
- Fields:
- Validators:
- field accent_color: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='Used for vega. Should be CSS color code.')] = '#222'[source]¶
Used for vega. Should be CSS color code.
- field document_denylist: Annotated[set[str], FieldInfo(annotation=NoneType, required=True, description='list of documents IDs to ignore.\n\nUse this to skip some documents at indexing time.')] [Optional][source]¶
list of documents IDs to ignore.
Use this to skip some documents at indexing time.
- field document_fetcher: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='The full qualified reference to the document fetcher,\ni.e. the class responsible from fetching the document.\nusing the document ID present in the Redis Stream.\n\nIt should inherit `app._import.BaseDocumentFetcher`\nand specialize the `fetch_document` method.\n\nTo keep things sleek,\nyou generally have few item fields in the event stream payload.\nThis class will fetch the full document using your application API.', examples=['app.openfoodfacts.DocumentFetcher'])] [Required][source]¶
The full qualified reference to the document fetcher, i.e. the class responsible from fetching the document. using the document ID present in the Redis Stream.
It should inherit app._import.BaseDocumentFetcher and specialize the fetch_document method.
To keep things sleek, you generally have few item fields in the event stream payload. This class will fetch the full document using your application API.
- field fields: Annotated[dict[str, FieldConfig], FieldInfo(annotation=NoneType, required=True, description='Configuration of all fields we need to store in the index.\n\nKeys are field names,\nvalues contain the field configuration.\n\nThis is a very important part of the configuration.\n\nMost of the ElasticSearch mapping will depends on it.\nElasticSearch will also use this configuration\nto provide intended behaviour.\n\n(see also [Explain Configuration](./explain_configuration.md#fields))\n\nIf you change those settings you will have to re-index all the data.\n(But you can do so in the background).')] [Required][source]¶
Configuration of all fields we need to store in the index.
Keys are field names, values contain the field configuration.
This is a very important part of the configuration.
Most of the ElasticSearch mapping will depends on it. ElasticSearch will also use this configuration to provide intended behaviour.
(see also [Explain Configuration](./explain_configuration.md#fields))
If you change those settings you will have to re-index all the data. (But you can do so in the background).
- field index: Annotated[ESIndexConfig, FieldInfo(annotation=NoneType, required=True, description="This is the configuration for the main index containing the data.\n\n It's used to create the index in ElasticSearch, and configure its mappings\n (along with the *fields* config)\n ")] [Required][source]¶
This is the configuration for the main index containing the data.
It’s used to create the index in ElasticSearch, and configure its mappings (along with the fields config)
- field lang_separator: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='for `text_lang` FieldType, the separator between the name of the field and the language code, ex: product_name_it if lang_separator="_"')] = '_'[source]¶
for text_lang FieldType, the separator between the name of the field and the language code, ex: product_name_it if lang_separator=”_”
- field match_phrase_boost: Annotated[float, FieldInfo(annotation=NoneType, required=True, description='How much we boost exact matches on consecutive words\n\nThat is, if you search "Dark Chocolate",\nit will boost entries that have the "Dark Chocolate" phrase (in the same field).\n\nIt only applies to free text search.\n\nThis only makes sense when using\n"boost_phrase" request parameters and "best match" order.\n\nNote: this field accept float of string,\nbecause using float might generate rounding problems.\nThe string must represent a float.')] = 2.0[source]¶
How much we boost exact matches on consecutive words
That is, if you search “Dark Chocolate”, it will boost entries that have the “Dark Chocolate” phrase (in the same field).
It only applies to free text search.
This only makes sense when using “boost_phrase” request parameters and “best match” order.
Note: this field accept float of string, because using float might generate rounding problems. The string must represent a float.
- field match_phrase_boost_proximity: Annotated[int | None, FieldInfo(annotation=NoneType, required=True, description='How much we allow proximity for `match_phrase_boost`.\n\nIf unspecified we will just match word to word.\nOtherwise it will allow some gap between words matching\n\nThis only makes sense when using\n"boost_phrase" request parameters and "best match" order.')] = None[source]¶
How much we allow proximity for match_phrase_boost.
If unspecified we will just match word to word. Otherwise it will allow some gap between words matching
This only makes sense when using “boost_phrase” request parameters and “best match” order.
- field preprocessor: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='The full qualified reference to the preprocessor\nto use before data import.\n\nThis class must inherit `app.indexing.BaseDocumentPreprocessor`\nand specialize the `preprocess` method.\n\nThis is used to adapt the data schema\nor to add search-a-licious specific fields\nfor example.', examples=['app.openfoodfacts.DocumentPreprocessor'])] | None = None[source]¶
- field primary_color: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='Used for vega charts. Use CSS color code.')] = '#aaa'[source]¶
Used for vega charts. Use CSS color code.
- field redis_stream_name: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, description="Name of the Redis stream to read from when listening to document updates.\n\nIf not provided, document updates won't be listened to for this index.")] = None[source]¶
Name of the Redis stream to read from when listening to document updates.
If not provided, document updates won’t be listened to for this index.
- field result_processor: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='The full qualified reference to the elasticsearch result processor\n to use after search query to Elasticsearch.\n\n) This class must inherit `app.postprocessing.BaseResultProcessor`\n and specialize the `process_after`\n\n This is can be used to add custom fields computed from index content.\n ', examples=['app.openfoodfacts.ResultProcessor'])] | None = None[source]¶
- field scripts: Annotated[dict[str, ScriptConfig], FieldInfo(annotation=NoneType, required=True, description="You can add scripts that can be used for sorting results.\n\nEach key is a script name, with it's configuration.")] | None = None[source]¶
- field split_separator: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='separator to use when splitting values, for fields that have split=True')] = ','[source]¶
separator to use when splitting values, for fields that have split=True
- field supported_langs: Annotated[list[str], FieldInfo(annotation=NoneType, required=True, description='A list of all supported languages, it is used to build index mapping', examples=[['en', 'fr', 'it']])] [Required][source]¶
A list of all supported languages, it is used to build index mapping
- field taxonomy: Annotated[TaxonomyConfig, FieldInfo(annotation=NoneType, required=True, description='Configuration of taxonomies,\n that is collections of entries with synonyms in multiple languages.\n\n See [Explain taxonomies](../explain-taxonomies)\n\n Field may be linked to taxonomies.\n\n It enables enriching search with synonyms,\n as well as providing suggestions,\n or informative facets.\n\n Note: if you define taxonomies, you must import them using\n [import-taxonomies command](../ref-python/cli.html#python3-m-app-import-taxonomies)\n ')] [Required][source]¶
Configuration of taxonomies, that is collections of entries with synonyms in multiple languages.
See [Explain taxonomies](../explain-taxonomies)
Field may be linked to taxonomies.
It enables enriching search with synonyms, as well as providing suggestions, or informative facets.
Note: if you define taxonomies, you must import them using [import-taxonomies command](../ref-python/cli.html#python3-m-app-import-taxonomies)
- validator add_field_name_to_each_field » fields[source]¶
It’s handy to have the name of the field in the field definition
- validator ensure_no_fields_use_reserved_name » fields[source]¶
Verify that no field name clashes with a reserved name
- validator field_references_must_exist_and_be_valid » all fields[source]¶
Validator that checks that every field reference in ESIndexConfig refers to an existing field and is valid.
- validator taxonomy_name_should_be_defined » all fields[source]¶
Validator that checks that for if taxonomy_type is defined for a field, it refers to a taxonomy defined in taxonomy.sources.
- property full_text_fields: dict[str, FieldConfig][source]¶
Fully qualified name of fields that are part of default full text search
- property lang_fields: dict[str, FieldConfig][source]¶
Fully qualified name of fields that are translated
- property text_lang_fields: dict[str, FieldConfig][source]¶
List all text_lang fields in an efficient way
- class app.config.LoggingLevel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Accepted logging levels
NOTSET - means no los
DEBUG / INFO / WARNING / ERROR / CRITICAL - match standard Python logging levels
- pydantic model app.config.ScriptConfig[source]¶
Scripts can be used to sort results of a search.
This use ElasticSearch internal capabilities
Show JSON schema
{ "title": "ScriptConfig", "description": "Scripts can be used to sort results of a search.\n\nThis use ElasticSearch internal capabilities", "type": "object", "properties": { "lang": { "allOf": [ { "$ref": "#/$defs/ScriptType" } ], "default": "expression", "description": "The script language, as supported by Elasticsearch" }, "source": { "description": "The source of the script", "title": "Source", "type": "string" }, "params": { "anyOf": [ { "description": "Params for the scripts. We need this to retrieve and validate parameters", "type": "object" }, { "type": "null" } ], "title": "Params" }, "static_params": { "anyOf": [ { "description": "Additional params for the scripts that can't be supplied by the API (constants)", "type": "object" }, { "type": "null" } ], "title": "Static Params" } }, "$defs": { "ScriptType": { "enum": [ "expression", "painless" ], "title": "ScriptType", "type": "string" } }, "required": [ "source", "params", "static_params" ] }
- Fields:
- field lang: Annotated[ScriptType, FieldInfo(annotation=NoneType, required=True, description='The script language, as supported by Elasticsearch')] = ScriptType.expression[source]¶
The script language, as supported by Elasticsearch
- field params: Annotated[dict[str, Any], FieldInfo(annotation=NoneType, required=True, description='Params for the scripts. We need this to retrieve and validate parameters')] | None [Required][source]¶
- class app.config.ScriptType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
- pydantic settings app.config.Settings[source]¶
Settings for Search-a-licious
The most important settings is config_path.
Those settings can be overridden through environment by using the name in capital letters. If you use docker compose, a good way to do that is to modify those values in your .env file.
- Fields:
- field config_path: Annotated[Path | None, FieldInfo(annotation=NoneType, required=True, description='Path to the search-a-licious yaml configuration file.\n\nSee [Explain configuration file](../explain-configuration/) for more information')] = None[source]¶
Path to the search-a-licious yaml configuration file.
See [Explain configuration file](../explain-configuration/) for more information
- field elasticsearch_url: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='URL to the ElasticSearch instance\n\nBare in mind this is from inside the container.')] = 'http://localhost:9200'[source]¶
URL to the ElasticSearch instance
Bare in mind this is from inside the container.
- field log_level: Annotated[LoggingLevel, FieldInfo(annotation=NoneType, required=True, description='Log level. Accepted logging levels\n\n * NOTSET - means no los\n * DEBUG / INFO / WARNING / ERROR / CRITICAL\n - match standard Python logging levels\n ')] = LoggingLevel.INFO[source]¶
Log level. Accepted logging levels
NOTSET - means no los
DEBUG / INFO / WARNING / ERROR / CRITICAL - match standard Python logging levels
- field redis_host: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='Host for the Redis instance containing event stream\n\nBare in mind this is from inside the container.')] = 'localhost'[source]¶
Host for the Redis instance containing event stream
Bare in mind this is from inside the container.
- field redis_port: Annotated[int, FieldInfo(annotation=NoneType, required=True, description='Port for the redis host instance containing event stream')] = 6379[source]¶
Port for the redis host instance containing event stream
- field redis_reader_timeout: Annotated[int, FieldInfo(annotation=NoneType, required=True, description='timeout in seconds to read redis event stream')] = 5[source]¶
timeout in seconds to read redis event stream
- field sentry_dns: Annotated[str | None, FieldInfo(annotation=NoneType, required=True, description='Sentry DNS to report incident, if None no incident is reported')] = None[source]¶
Sentry DNS to report incident, if None no incident is reported
- field synonyms_path: Annotated[Path, FieldInfo(annotation=NoneType, required=True, description='Path of the directory that will contain synonyms for ElasticSearch instances')] = PosixPath('/opt/search/synonyms')[source]¶
Path of the directory that will contain synonyms for ElasticSearch instances
- class app.config.SettingsGenerateJsonSchema(by_alias: bool = True, ref_template: str = '#/$defs/{model}')[source]¶
Config to add fields to generated JSON schema for Settings.
- generate(schema, mode='validation')[source]¶
Generates a JSON schema for a specified schema in a specified mode.
- Args:
schema: A Pydantic model. mode: The mode in which to generate the schema. Defaults to ‘validation’.
- Returns:
A JSON schema representing the specified schema.
- Raises:
PydanticUserError: If the JSON schema generator has already been used to generate a JSON schema.
- pydantic model app.config.TaxonomyConfig[source]¶
Configuration of taxonomies, that is collections of entries with synonyms in multiple languages.
See [Explain taxonomies](../explain-taxonomies)
Field may be linked to taxonomies.
It enables enriching search with synonyms, as well as providing suggestions, or informative facets.
Note: if you define taxonomies, you must import them using [import-taxonomies command](../ref-python/cli.html#python3-m-app-import-taxonomies)
Show JSON schema
{ "title": "TaxonomyConfig", "description": "Configuration of taxonomies,\nthat is collections of entries with synonyms in multiple languages.\n\nSee [Explain taxonomies](../explain-taxonomies)\n\nField may be linked to taxonomies.\n\nIt enables enriching search with synonyms,\nas well as providing suggestions,\nor informative facets.\n\nNote: if you define taxonomies, you must import them using\n[import-taxonomies command](../ref-python/cli.html#python3-m-app-import-taxonomies)", "type": "object", "properties": { "sources": { "description": "Configurations of taxonomies that this project will use.", "items": { "$ref": "#/$defs/TaxonomySourceConfig" }, "title": "Sources", "type": "array" }, "index": { "allOf": [ { "$ref": "#/$defs/TaxonomyIndexConfig" } ], "description": "This is the configuration of\n the ElasticSearch index storing the taxonomies.\n\n All taxonomies are stored within the same index.\n\n It enables functions like auto-completion, or field suggestions\n as well as enrichment of requests with synonyms.\n " }, "preprocessor": { "anyOf": [ { "description": "The full qualified reference to the preprocessor\nto use before taxonomy entry import.\n\nThis class must inherit `app.indexing.BaseTaxonomyPreprocessor`\nand specialize the `preprocess` method.\n\nThis is used to adapt the taxonomy schema\nor to add specific fields for example.", "examples": [ "app.openfoodfacts.TaxonomyPreprocessor" ], "type": "string" }, { "type": "null" } ], "default": null, "title": "Preprocessor" } }, "$defs": { "TaxonomyIndexConfig": { "description": "This is the configuration of\nthe ElasticSearch index storing the taxonomies.\n\nAll taxonomies are stored within the same index.\n\nIt enables functions like auto-completion, or field suggestions\nas well as enrichment of requests with synonyms.", "properties": { "name": { "description": "Name of the index alias to use.\n\nSearch-a-licious will create an index using this name and an import date,\nbut alias will always point to the latest index.\n\nThe alias must not already exists in your ElasticSearch instance.", "title": "Name", "type": "string" }, "number_of_shards": { "default": 4, "description": "Number of shards to use for the index.\n\nShards are useful to distribute the load on your cluster.\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Shards", "type": "integer" }, "number_of_replicas": { "default": 1, "description": "Number of replicas to use for the index.\n\nMore replica means more resiliency but also more disk space and memory.\n\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Replicas", "type": "integer" } }, "required": [ "name" ], "title": "TaxonomyIndexConfig", "type": "object" }, "TaxonomySourceConfig": { "description": "Configuration on how to fetch a particular taxonomy.", "properties": { "name": { "description": "Name of the taxonomy\n\nThis is the name you will use in the configuration (and the API)\nto reference this taxonomy", "title": "Name", "type": "string" }, "url": { "anyOf": [ { "format": "uri", "minLength": 1, "type": "string" }, { "format": "uri", "maxLength": 2083, "minLength": 1, "type": "string" } ], "description": "URL of the taxonomy.\n\nThe target file must be in JSON format\nand follows Open Food Facts JSON taxonomy format.\n\nThis is a dict where each key correspond to a taxonomy entry id,\nvalues are dict with following properties:\n\n* name: contains a dict giving the name (string) for this entry\n in various languages (keys are language codes)\n* synonyms: contains a dict giving a list of synonyms by language code\n* parents: contains a list of direct parent ids (taxonomy is a directed acyclic graph)\n\nOther keys correspond to properties associated to this entry (eg. wikidata id).", "title": "Url" } }, "required": [ "name", "url" ], "title": "TaxonomySourceConfig", "type": "object" } }, "required": [ "sources", "index" ] }
- Fields:
- field index: Annotated[TaxonomyIndexConfig, FieldInfo(annotation=NoneType, required=True, description='This is the configuration of\n the ElasticSearch index storing the taxonomies.\n\n All taxonomies are stored within the same index.\n\n It enables functions like auto-completion, or field suggestions\n as well as enrichment of requests with synonyms.\n ')] [Required][source]¶
This is the configuration of the ElasticSearch index storing the taxonomies.
All taxonomies are stored within the same index.
It enables functions like auto-completion, or field suggestions as well as enrichment of requests with synonyms.
- field preprocessor: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='The full qualified reference to the preprocessor\nto use before taxonomy entry import.\n\nThis class must inherit `app.indexing.BaseTaxonomyPreprocessor`\nand specialize the `preprocess` method.\n\nThis is used to adapt the taxonomy schema\nor to add specific fields for example.', examples=['app.openfoodfacts.TaxonomyPreprocessor'])] | None = None[source]¶
- field sources: Annotated[list[TaxonomySourceConfig], FieldInfo(annotation=NoneType, required=True, description='Configurations of taxonomies that this project will use.')] [Required][source]¶
Configurations of taxonomies that this project will use.
- pydantic model app.config.TaxonomyIndexConfig[source]¶
This is the configuration of the ElasticSearch index storing the taxonomies.
All taxonomies are stored within the same index.
It enables functions like auto-completion, or field suggestions as well as enrichment of requests with synonyms.
Show JSON schema
{ "title": "TaxonomyIndexConfig", "description": "This is the configuration of\nthe ElasticSearch index storing the taxonomies.\n\nAll taxonomies are stored within the same index.\n\nIt enables functions like auto-completion, or field suggestions\nas well as enrichment of requests with synonyms.", "type": "object", "properties": { "name": { "description": "Name of the index alias to use.\n\nSearch-a-licious will create an index using this name and an import date,\nbut alias will always point to the latest index.\n\nThe alias must not already exists in your ElasticSearch instance.", "title": "Name", "type": "string" }, "number_of_shards": { "default": 4, "description": "Number of shards to use for the index.\n\nShards are useful to distribute the load on your cluster.\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Shards", "type": "integer" }, "number_of_replicas": { "default": 1, "description": "Number of replicas to use for the index.\n\nMore replica means more resiliency but also more disk space and memory.\n\n(see [index settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings))", "title": "Number Of Replicas", "type": "integer" } }, "required": [ "name" ] }
- Fields:
- pydantic model app.config.TaxonomySourceConfig[source]¶
Configuration on how to fetch a particular taxonomy.
Show JSON schema
{ "title": "TaxonomySourceConfig", "description": "Configuration on how to fetch a particular taxonomy.", "type": "object", "properties": { "name": { "description": "Name of the taxonomy\n\nThis is the name you will use in the configuration (and the API)\nto reference this taxonomy", "title": "Name", "type": "string" }, "url": { "anyOf": [ { "format": "uri", "minLength": 1, "type": "string" }, { "format": "uri", "maxLength": 2083, "minLength": 1, "type": "string" } ], "description": "URL of the taxonomy.\n\nThe target file must be in JSON format\nand follows Open Food Facts JSON taxonomy format.\n\nThis is a dict where each key correspond to a taxonomy entry id,\nvalues are dict with following properties:\n\n* name: contains a dict giving the name (string) for this entry\n in various languages (keys are language codes)\n* synonyms: contains a dict giving a list of synonyms by language code\n* parents: contains a list of direct parent ids (taxonomy is a directed acyclic graph)\n\nOther keys correspond to properties associated to this entry (eg. wikidata id).", "title": "Url" } }, "required": [ "name", "url" ] }
- field name: Annotated[str, FieldInfo(annotation=NoneType, required=True, description='Name of the taxonomy\n\nThis is the name you will use in the configuration (and the API)\nto reference this taxonomy')] [Required][source]¶
Name of the taxonomy
This is the name you will use in the configuration (and the API) to reference this taxonomy
- field url: Annotated[Annotated[Url, UrlConstraints(max_length=None, allowed_schemes=['file'], host_required=None, default_host=None, default_port=None, default_path=None)] | Annotated[Url, UrlConstraints(max_length=2083, allowed_schemes=['http', 'https'], host_required=None, default_host=None, default_port=None, default_path=None)], FieldInfo(annotation=NoneType, required=True, description='URL of the taxonomy.\n\nThe target file must be in JSON format\nand follows Open Food Facts JSON taxonomy format.\n\nThis is a dict where each key correspond to a taxonomy entry id,\nvalues are dict with following properties:\n\n* name: contains a dict giving the name (string) for this entry\n in various languages (keys are language codes)\n* synonyms: contains a dict giving a list of synonyms by language code\n* parents: contains a list of direct parent ids (taxonomy is a directed acyclic graph)\n\nOther keys correspond to properties associated to this entry (eg. wikidata id).')] [Required][source]¶
URL of the taxonomy.
The target file must be in JSON format and follows Open Food Facts JSON taxonomy format.
This is a dict where each key correspond to a taxonomy entry id, values are dict with following properties:
name: contains a dict giving the name (string) for this entry in various languages (keys are language codes)
synonyms: contains a dict giving a list of synonyms by language code
parents: contains a list of direct parent ids (taxonomy is a directed acyclic graph)
Other keys correspond to properties associated to this entry (eg. wikidata id).