JSON schema for search-a-licious configuration file

root indices IndexConfig index

ESIndexConfig

Type: object

This is the configuration for the main index containing the data.

It's used to create the index in ElasticSearch, and configure its mappings
(along with the *fields* config)

root indices IndexConfig index name

Name

Type: string

Name of the index alias to use.

Search-a-licious will create an index using this name and an import date,
but alias will always point to the latest index.

The alias must not already exists in your ElasticSearch instance.

root indices IndexConfig index number_of_shards

Number Of Shards

Type: integer Default: 4

Number of shards to use for the index.

Shards are useful to distribute the load on your cluster.
(see index settings)

root indices IndexConfig index number_of_replicas

Number Of Replicas

Type: integer Default: 1

Number of replicas to use for the index.

More replica means more resiliency but also more disk space and memory.

(see index settings)

root indices IndexConfig index id_field_name

Id Field Name

Type: string

Name of the field to use for _id.
it is mandatory to provide one.

If your dataset does not have an identifier field,
you should use a document preprocessor to compute one (see preprocessor).

root indices IndexConfig index last_modified_field_name

Last Modified Field Name

Type: string

Name of the field containing the date of last modification,
in your indexed objects.

This is used for incremental updates using Redis queues.

The field value must be an int/float representing the timestamp.

root indices IndexConfig fields

Fields

Type: object

Configuration of all fields we need to store in the index.

Keys are field names,
values contain the field configuration.

This is a very important part of the configuration.

Most of the ElasticSearch mapping will depends on it.
ElasticSearch will also use this configuration
to provide intended behaviour.

(see also Explain Configuration)

If you change those settings you will have to re-index all the data.
(But you can do so in the background).

Each additional property must conform to the following schema

root indices IndexConfig fields FieldConfig

FieldConfig

Type: object

root indices IndexConfig fields FieldConfig name

Name

Type: string Default: ""

name of the field (must be unique

root indices IndexConfig fields FieldConfig type

FieldType

Type: enum (of string)

Type of the field

Supported field types in Search-a-Licious are:

* keyword: string values that won't be interpreted (tokenized).
  Good for things like tags, serial, property values, etc.
* date: Date fields
* double, float, half_float, scaled_float:
  different ways of storing floats with different capacity
* short, integer, long, unsigned_long :
  integers (with different capacity:  8 / 16 / 32 bits)
* bool: boolean (true / false) values
* text: a text which is tokenized to enable full text search
* text_lang: like text, but with different values in different languages.
  Tokenization will use analyzers specific to each languages.
* taxonomy: a field akin to keyword but
  with support for matching using taxonomy synonyms and translations
  (and in fact also a text mapping possibility)
* disabled: a field that is not stored nor searchable
  (see [Elasticsearch help])
* object: this field contains a dict with sub-fields.
* nested: this field contains an array of objects.

Must be one of:

"keyword"
"date"
"half_float"
"scaled_float"
"float"
"double"
"integer"
"short"
"long"
"unsigned_long"
"bool"
"text"
"text_lang"
"taxonomy"
"disabled"
"object"
"nested"

root indices IndexConfig fields FieldConfig required

Required

Type: boolean Default: false

if required=True, the field is required in the input data

An entry that does not contains a value for this field will be rejected.

root indices IndexConfig fields FieldConfig input_field

Input Field

Default: null

name of the input field to use when importing data

By default, Search-a-licious use the same name as the field name.

This is useful to index the same field using different types or configurations.

root indices IndexConfig fields FieldConfig input_field anyOf item 0

Type: string

root indices IndexConfig fields FieldConfig input_field anyOf item 1

Type: null

root indices IndexConfig fields FieldConfig split

Split

Type: boolean Default: false

do we split the input field with split_separator ?

This is useful if you have some text fields that contains list of values,
(for example a comma separated list of values, like apple,banana,carrot).

You must set split_separator to the character that separates the values in the dataset.

root indices IndexConfig fields FieldConfig full_text_search

Full Text Search

Type: boolean Default: false

Wether this field in included on default full text search.

If false, the field is only used during search
when filters involving this field are provided
(as opposed to full text search expressions without any explicit field).

root indices IndexConfig fields FieldConfig bucket_agg

Bucket Agg

Type: boolean Default: false

do we add an bucket aggregation to the elasticsearch query for this field.

It is used to return a 'faceted-view' with the number of results for each facet value,
or to generate bar charts.

Only valid for keyword, taxonomy or numeric field types.

root indices IndexConfig fields FieldConfig taxonomy_name

Taxonomy Name

Default: null

the name of the taxonomy associated with this field.

It must only be provided for taxonomy field type.

Any of

Option 1
Option 2

root indices IndexConfig fields FieldConfig taxonomy_name anyOf item 0

Type: string

root indices IndexConfig fields FieldConfig taxonomy_name anyOf item 1

Type: null

root indices IndexConfig fields FieldConfig fields

Fields

Default: null

Sub fields configuration

This is valid only for "object" and "nested" fields,
and must be provided in this case.

Keys are field names,
values contain the field configuration.

Note: that although dynamic fields are supported in Elasticsearch,
we don't support them in Search-a-licious,
because they lead to nasty bugs, and are not meant for production use.

Any of

Option 1
Option 2

root indices IndexConfig fields FieldConfig fields anyOf item 0

Type: object

Each additional property must conform to the following schema

root indices IndexConfig fields FieldConfig fields anyOf item 0 FieldConfig

FieldConfig

Type: object
Same definition as FieldConfig

root indices IndexConfig fields FieldConfig fields anyOf item 1

Type: null

root indices IndexConfig split_separator

Split Separator

Type: string Default: ","

separator to use when splitting values, for fields that have split=True

root indices IndexConfig lang_separator

Lang Separator

Type: string Default: "_"

for text_lang FieldType, the separator between the name of the field and the language code, ex: productnameit if langseparator=""

root indices IndexConfig primary_color

Primary Color

Type: string Default: "#aaa"

Used for vega charts. Use CSS color code.

root indices IndexConfig accent_color

Accent Color

Type: string Default: "#222"

Used for vega. Should be CSS color code.

root indices IndexConfig taxonomy

TaxonomyConfig

Type: object

Configuration of taxonomies,
that is collections of entries with synonyms in multiple languages.

See [Explain taxonomies](../explain-taxonomies)

Field may be linked to taxonomies.

It enables enriching search with synonyms,
as well as providing suggestions,
or informative facets.

Note: if you define taxonomies, you must import them using
[import-taxonomies command](../ref-python/cli.html#python3-m-app-import-taxonomies)

root indices IndexConfig taxonomy sources

Sources

Type: array

Configurations of taxonomies that this project will use.

No Additional Items

Each item of this array must be:

root indices IndexConfig taxonomy sources TaxonomySourceConfig

TaxonomySourceConfig

Type: object

Configuration on how to fetch a particular taxonomy.

root indices IndexConfig taxonomy sources TaxonomySourceConfig name

Name

Type: string

Name of the taxonomy

This is the name you will use in the configuration (and the API)
to reference this taxonomy

root indices IndexConfig taxonomy sources TaxonomySourceConfig url

Url

URL of the taxonomy.

The target file must be in JSON format
and follows Open Food Facts JSON taxonomy format.

This is a dict where each key correspond to a taxonomy entry id,
values are dict with following properties:

name: contains a dict giving the name (string) for this entry
in various languages (keys are language codes)
synonyms: contains a dict giving a list of synonyms by language code
parents: contains a list of direct parent ids (taxonomy is a directed acyclic graph)

Other keys correspond to properties associated to this entry (eg. wikidata id).

Any of

Option 1
Option 2

root indices IndexConfig taxonomy sources TaxonomySourceConfig url anyOf item 0

Type: stringFormat: uri

Must be at least 1 characters long

root indices IndexConfig taxonomy sources TaxonomySourceConfig url anyOf item 1

Type: stringFormat: uri

Must be at least 1 characters long

Must be at most 2083 characters long

root indices IndexConfig taxonomy index

TaxonomyIndexConfig

Type: object

This is the configuration of
the ElasticSearch index storing the taxonomies.

All taxonomies are stored within the same index.

It enables functions like auto-completion, or field suggestions
as well as enrichment of requests with synonyms.

root indices IndexConfig taxonomy index name

Name

Type: string

Name of the index alias to use.

Search-a-licious will create an index using this name and an import date,
but alias will always point to the latest index.

The alias must not already exists in your ElasticSearch instance.

root indices IndexConfig taxonomy index number_of_shards

Number Of Shards

Type: integer Default: 4

Number of shards to use for the index.

Shards are useful to distribute the load on your cluster.
(see index settings)

root indices IndexConfig taxonomy index number_of_replicas

Number Of Replicas

Type: integer Default: 1

Number of replicas to use for the index.

More replica means more resiliency but also more disk space and memory.

(see index settings)

root indices IndexConfig taxonomy preprocessor

Preprocessor

Default: null

Any of

Option 1
Option 2

root indices IndexConfig taxonomy preprocessor anyOf item 0

Type: string

The full qualified reference to the preprocessor
to use before taxonomy entry import.

This class must inherit app.indexing.BaseTaxonomyPreprocessor
and specialize the preprocess method.

This is used to adapt the taxonomy schema
or to add specific fields for example.

Example:

app.openfoodfacts.TaxonomyPreprocessor

root indices IndexConfig taxonomy preprocessor anyOf item 1

Type: null

root indices IndexConfig supported_langs

Supported Langs

Type: array of string

A list of all supported languages, it is used to build index mapping

No Additional Items

Each item of this array must be:

root indices IndexConfig supported_langs supported_langs items

Type: string

Example:

['en', 'fr', 'it']

root indices IndexConfig document_fetcher

Document Fetcher

Type: string

The full qualified reference to the document fetcher,
i.e. the class responsible from fetching the document.
using the document ID present in the Redis Stream.

It should inherit app._import.BaseDocumentFetcher
and specialize the fetch_document method.

To keep things sleek,
you generally have few item fields in the event stream payload.
This class will fetch the full document using your application API.

Example:

app.openfoodfacts.DocumentFetcher

root indices IndexConfig preprocessor

Preprocessor

Default: null

Any of

Option 1
Option 2

root indices IndexConfig preprocessor anyOf item 0

Type: string

The full qualified reference to the preprocessor
to use before data import.

This class must inherit app.indexing.BaseDocumentPreprocessor
and specialize the preprocess method.

This is used to adapt the data schema
or to add search-a-licious specific fields
for example.

Example:

app.openfoodfacts.DocumentPreprocessor

root indices IndexConfig preprocessor anyOf item 1

Type: null

root indices IndexConfig result_processor

Result Processor

Default: null

Any of

Option 1
Option 2

root indices IndexConfig result_processor anyOf item 0

Type: string

The full qualified reference to the elasticsearch result processor
to use after search query to Elasticsearch.

) This class must inherit app.postprocessing.BaseResultProcessor
and specialize the process_after

                This is can be used to add custom fields computed from index content.

Example:

app.openfoodfacts.ResultProcessor

root indices IndexConfig result_processor anyOf item 1

Type: null

root indices IndexConfig scripts

Scripts

Default: null

Any of

Option 1
Option 2

root indices IndexConfig scripts anyOf item 0

Type: object

You can add scripts that can be used for sorting results.

Each key is a script name, with it's configuration.

Each additional property must conform to the following schema

root indices IndexConfig scripts anyOf item 0 ScriptConfig

ScriptConfig

Type: object

Scripts can be used to sort results of a search.

This use ElasticSearch internal capabilities

root indices IndexConfig scripts anyOf item 0 ScriptConfig lang

ScriptType

Type: enum (of string) Default: "expression"

The script language, as supported by Elasticsearch

Must be one of:

"expression"
"painless"

root indices IndexConfig scripts anyOf item 0 ScriptConfig source

Source

Type: string

The source of the script

root indices IndexConfig scripts anyOf item 0 ScriptConfig params

Params

Any of

Option 1
Option 2

root indices IndexConfig scripts anyOf item 0 ScriptConfig params anyOf item 0

Type: object

Params for the scripts. We need this to retrieve and validate parameters

root indices IndexConfig scripts anyOf item 0 ScriptConfig params anyOf item 1

Type: null

root indices IndexConfig scripts anyOf item 0 ScriptConfig static_params

Static Params

Any of

Option 1
Option 2

root indices IndexConfig scripts anyOf item 0 ScriptConfig static_params anyOf item 0

Type: object

Additional params for the scripts that can't be supplied by the API (constants)

root indices IndexConfig scripts anyOf item 0 ScriptConfig static_params anyOf item 1

Type: null

root indices IndexConfig scripts anyOf item 1

Type: null

root indices IndexConfig match_phrase_boost

Match Phrase Boost

Type: number Default: 2.0

How much we boost exact matches on consecutive words

That is, if you search "Dark Chocolate",
it will boost entries that have the "Dark Chocolate" phrase (in the same field).

It only applies to free text search.

This only makes sense when using
"boost_phrase" request parameters and "best match" order.

Note: this field accept float of string,
because using float might generate rounding problems.
The string must represent a float.

root indices IndexConfig match_phrase_boost_proximity

Match Phrase Boost Proximity

Default: null

How much we allow proximity for match_phrase_boost.

If unspecified we will just match word to word.
Otherwise it will allow some gap between words matching

This only makes sense when using
"boost_phrase" request parameters and "best match" order.

Any of

Option 1
Option 2

root indices IndexConfig match_phrase_boost_proximity anyOf item 0

Type: integer

root indices IndexConfig match_phrase_boost_proximity anyOf item 1

Type: null

root indices IndexConfig document_denylist

Document Denylist

Type: array of string

list of documents IDs to ignore.

Use this to skip some documents at indexing time.

All items must be unique

No Additional Items

Each item of this array must be:

root indices IndexConfig document_denylist document_denylist items

Type: string

root indices IndexConfig redis_stream_name

Redis Stream Name

Default: null

Name of the Redis stream to read from when listening to document updates.

If not provided, document updates won't be listened to for this index.

Any of

Option 1
Option 2

root indices IndexConfig redis_stream_name anyOf item 0

Type: string

root indices IndexConfig redis_stream_name anyOf item 1

Type: null

JSON schema for search-a-licious configuration file

indices Required

Indices

Additional Properties

IndexConfig

index Required

ESIndexConfig

name Required

Name

number_of_shards

Number Of Shards

number_of_replicas

Number Of Replicas

id_field_name Required

Id Field Name

last_modified_field_name Required

Last Modified Field Name

fields Required

Fields

Additional Properties

FieldConfig

name

Name

type Required

FieldType

Must be one of:

required

Required

input_field

Input Field

Any of

split

Split

full_text_search

Full Text Search

bucket_agg

Bucket Agg

taxonomy_name

Taxonomy Name

Any of

fields

Fields

Any of

Additional Properties

FieldConfig

split_separator

Split Separator

lang_separator

Lang Separator

primary_color

Primary Color

accent_color

Accent Color

taxonomy Required

TaxonomyConfig

sources Required

Sources

Each item of this array must be:

TaxonomySourceConfig

name Required

Name

url Required

Url

Any of

index Required

TaxonomyIndexConfig

name Required

Name

number_of_shards

Number Of Shards

number_of_replicas

Number Of Replicas

preprocessor

Preprocessor

Any of

supported_langs Required

Supported Langs

Each item of this array must be:

document_fetcher Required

Document Fetcher