Imagine snapping a pic of your tasty restaurant entree or the magnificent lasagna in a foodie post, and up pops a recipe for said dish. Facebook AI has now transformed that gourmand’s fantasy into reality.
Facebook’s AI’s new “Inverse Cooking” AI system reverse-engineers recipes from food images, predicting both the ingredients in the dish and their preparation and cooking instructions. The technique improves on former ingredient prediction baselines with the large-scale Recipe1M data set; and Facebook says the recipes are more accurate than traditional retrieval-based approaches.
The image-to-recipe system recognizes individual ingredients and infers what happened to them on the way to the plate. It predicts the ingredients by extracting visual features from the input image and ingredient co-occurrences. Researchers first pretrain an image encoder and an ingredients decoder to predict ingredients, then train the system to produce dish title, preparation and cooking instructions. Prediction and model generation is the final step, where the system feeds predicted ingredients into an advanced sequence generation model to come up with a recipe.
Previous image-to-recipe models would simply attempt to retrieve recipes from a food image dataset based on similarity scores. This approach however requires huge datasets and naturally struggles when the target dish is not found in the dataset. Facebook AI’s novel approach is to reframe image-to-recipe as a conditional generation problem. Instead of obtaining a recipe directly from an image, an intermediate step of “predicting ingredients” is added to their recipe-generation pipeline. The dish preparation sequence can then be generated conditioned on both the original image and the newly defined ingredients list. Researchers say this connection between image and ingredients provides the system with insights that improve recipe accuracy.
As with most any new AI system, it did not take long before people started looking for vulnerabilities. One prankster who input a picture of popular anime character Pikachu says the system output a recipe for yellow custard pudding.
Facebook believes the new technique can have applications beyond the kitchen and popular food culture: “This kind of training can be used for any problem that requires predicting long structured text from an image and predicted keywords.”
Author: Hongxi Li | Editor: Michael Sarazen