<<

NAME

ProductOpener::Export - export products data in CSV format

SYNOPSIS

ProductOpener::Export is used to export the data of all populated fields of products matching a given MongoDB search query in Open Food Facts CSV format (UTF-8 encoding, tab separated).

    use ProductOpener::Export qw/:all/;
        export_csv( { filehandle=>*STDOUT,
                query=>{ countries_tags=>"en:france", labels_tags=>"en:organic" } });

Only columns that are not completely empty will be included in the resulting CSV file. This is to avoid generating CSV files with thousands of empty columns (e.g. all possible nutrients and all the language specific fields like ingredients_text_[language code] for all the hundreds of possible languages.

Fields that are computed from other fields are not directly provided by users or producers are not exported by default. They can be exported by passing a list of extra fields:

        export_csv( { filehandle=>$fh,
                extra_fields=>[qw(nova_group nutrition_grade_fr)] });

It is also possible to restrict the set of fields to be exported:

        export_csv( { filehandle=>$fh,
                fields=>[qw(code ingredients_text_en additives_tags)] });

This module is used in particular to export product data provided by manufacturers on the producers platform so that it can then be imported in the public database.

In the producers platform, the export_csv function is executed through a Minion worker.

It is also used in the scripts/export_csv_file.pl script.

DESCRIPTION

Use the list of fields from Product::Opener::Config::options{import_export_fields_groups} and the list of nutrients from Product::Opener::Food::nutriments_tables to list fields that need to be exported.

The results of the query are scanned a first time to compute the list of non-empty columns.

The results of the query are scanned a second time to output the CSV file.

This 2 phases approach is done to avoid having to store all the products data in memory.

If the fields to exports are specified with the fields parameter, the first phase is skipped.

FUNCTIONS

export_csv( FILEHANDLE, QUERY[, OPTIONS ] )

export_csv() outputs data in CSV format for products matching a query.

Only non empty columns are included. By default, fields that are computed from other fields are not included, but extra fields can be exported using the third OPTIONS argument.

Arguments

Arguments are passed through a single hash reference with the following keys:

filehandle - required - File handle where the CSV data will be output

The file handle can be to a file on disk, to STDOUT etc.

query - optional - MongoDB Query

Hash ref that specifies the query that will be passed to MongoDB. Each key value pair will be used to filter products with matching field values.

   export_csv( { filehandle=>$fh,
        query => { categories_tags => "en:beers", ingredients_tags => "en:wheat" }});

extra_fields - optional - Extra fields to export

Array ref that specifies a list of additional fields to export, including fields that are computed from other fields such as the NOVA group or the Nutri-Score nutritional grade.

Columns for the extra fields will be added after the columns for the populated fields from user and producers.

        export_csv({ filehandle=>$fh,
                extra_fields => [qw(nova_group nutrition_grade_fr)] });

fields - optional - Restrict the fields to export

Array ref that specifies the exact list of fields to export. Only the specified fields will be exported.

        export_csv({ filehandle=>$fh,
                fields => [qw(code ingredients_text_en additives_tags)] });

include_images_paths - optional - Export local file paths to images

If defined and not null, specifies to export local file paths for selected images for front, ingredients and nutrition in all languages.

This option is used in particular for exporting from the producers platform and importing to the public database.

include_obsolete_products - optional - Also export obsolete products

Obsolete products are in the products_obsolete collection.

Return value

Count of the exported documents.

Note: if the max_count parameter is passed

<<