GoodIT 2022: Fuzzy matching on big-data

Fuzzy matching on big-data: illustration with scanner and crowd-sourced nutritional datasets

GoodIT 22 is an ACM International Conference on Information Technologyfor Social Good.

My presentation is available at https://linogaliana.github.io/relevanc-goodIT22 Source code is on Github here

Purpose

To make the most of automatically collected scanner data for consumption studies, we link these products with crowd-sourced nutritional databases using textual search techniques. This approach requires the application of state-of-the-art textual analysis methods, including word embeddings, as well as efficient search tools to scale up.

Understand what is the nature, nutritional or environmental quality of food products consumed in supermarket will help to develop a sustainable and healthy consumption. The development of applications that provide information on products (nutritional characteristics, packaging, carbon footprint, etc.) opens up new perspectives on the analysis of scanner data at population scale once they have been matched. It is thus important to propose a method to associate these data sources that is reliable, flexible and efficient.

Proceedings

The associated paper in ACM Proceedings is here. You can also get the PDF version below:

This browser does not support PDFs embedding. Please download the PDF to view it: Download PDF.

Lino Galiana
Lino Galiana
Data Scientist

I am data scientist in French national statistical institute, Insee. I study how emerging data or new computational methods help to renew the production of statistical knowledge.