How to update the index#
As you use search-a-licious, you will first import the data, but then, you might need to update the index to keep it up to date with the latest data.
There are two strategies to update the index:
- either you push events to the redis stream, and search-a-licious update the index continuously
- either you update the index from time to time using a new import of whole dataset
First import#
First import will populate Elasticsearch index with all the data.
See initial import section in tutorial
and import
reference documentation
It's important to note that if you don't use the continous updates strategy,
you need to use --skip-updates
option.
Continuous updates#
To have continuous updates, you need to push events to the redis stream.
Normally this will be done by your application.
On each update/removal/etc. it must push events with at least:
- the document id
- and eventually more info (if you need them to filter out items, for example).
Of course you can also imagine to push events from another service, if you have another way of getting changes, but this part is up to you.
Then you just have to run the updater
container that comes in the docker-compose configuration.
docker-compose up -d updater
This will continuously fetch updates from the event stream, and update the index accordingly.
At start it will compute last update time using
the last_modified_field_name
from the configuration
to know from where to start processing the event stream.
Updating the index from time to time with an export#
Another way to update the index is to periodically re-import the all the data, or changed data.
This is less compelling to your users, but this might be the best way if you are using an external database publishing changes on a regular bases.
For that you can use the import
command,
with the --skip-updates
option, and with the --partial
option if you are importing only changed data
(otherwise it is advised to use the normal import process, which can be rolled-back (it create a new index)).
Document fetcher and pre-processing#
In the configuration, you can define a
document_fetcher
and a preprocessor
to transform the data.
Those are fully qualified dotted names to python classes.
document_fetcher
is only used on continuous updates,
while preprocessor
is used both on continuous updates and on initial import.