GoodIT 2022: Fuzzy matching on big-data

GoodIT 22 is an ACM International Conference on Information Technologyfor Social Good.
My presentation is available
at https://linogaliana.github.io/relevanc-goodIT22
Source code is on Github
here
Purpose
To make the most of automatically collected scanner data for consumption studies, we link these products with crowd-sourced nutritional databases using textual search techniques. This approach requires the application of state-of-the-art textual analysis methods, including word embeddings, as well as efficient search tools to scale up.
Understand what is the nature, nutritional or environmental quality of food products consumed in supermarket will help to develop a sustainable and healthy consumption. The development of applications that provide information on products (nutritional characteristics, packaging, carbon footprint, etc.) opens up new perspectives on the analysis of scanner data at population scale once they have been matched. It is thus important to propose a method to associate these data sources that is reliable, flexible and efficient.
Proceedings
The associated paper in ACM Proceedings is here. You can also get the PDF version below: