Cooking shows have moved beyond unglamorous narrations like “bring three litres of water to the boil” or even “dice the kiwi.” These days, cooking is performance — dynamic, dramatic, and designed to impact not only the palate but also the other senses. Award-winning chef Emeril Lagasse sums it up with his trademark catchphrase: “BAM!”
The art of cooking and other things that people do and say in kitchens is the focus of the EPIC-KITCHENS dataset. Introduced in 2018, the collection of annotated first-person viewpoint videos of individuals cooking and interacting with objects in their kitchens has enabled AI researchers to explore a variety of challenges in video understanding.
In a new paper, researchers from the University of Bristol, the University of Toronto and the University of Catania explain how they created Epic-Kitchens and introduce new baselines that emphasize the multimodal nature of the largest such egocentric video benchmark.

Unlike previous action classification benchmarks whose videos tend to be of short duration or recorded in scripted environments, Epic-Kitchen was created to capture unscripted and natural interactions from everyday scenarios — whether one grills chicken with the same gusto as a Lagasse or bakes cookies like a grandma.
The researchers note that the recordings also show the multitasking that home chefs naturally perform, like washing a few dishes during the cooking process. “Such parallel-goal interactions have not been captured in existing datasets, making this both a more realistic as well as a more challenging set of recordings.”


The researchers instructed 32 participants covering 10 nationalities and five languages to record their kitchen time for at least three consecutive days using a head-mounted GoPro camera.
The participants then watched their videos and recorded a sort of live commentary of the actions they performed to generate “coarse annotation” speech data. The researchers say recent attempts in image annotations using speech have produced speed-ups of up to 15x when annotating ImageNet. The researchers also believe the participants can describe the actions better than independent observers simply because they were the ones performing the actions.
Some issues emerged, for example, synonyms in the free text that participants used in their annotations. Different people said “put”, “place”, “put down”, “put back”, “leave”, or “return” when describing similar object-placing actions. The researchers grouped such annotations into classes to minimize semantic overlap and to accommodate common approaches to multiclass detection and recognition, where each example is believed to belong to one class only.
The resulting dataset features 55 hours of video (11.5 M frames), and a total of 39.6K action segments with 454.2K labelled object bounding boxes.





The Epic-Kitchens researchers chose three challenges for testing — object detection, action recognition, and action anticipation — which they say form the base for a higher-level understanding of the participants’ actions and intentions.
The team evaluated several existing methods to demonstrate how challenging Epic-Kitchens is and identify shortcomings in current SOTA approaches. The results on object detection using Faster R-CNN showed that objects in the Epic-Kitchens dataset are generally harder to detect than those in most other current datasets. The team also noted the importance of explicit temporal modelling in action recognition, with models that incorporated temporal modelling in the architectural design showing improved accuracy for example on verb classification tasks.
The paper The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines is on arXiv.
Journalist: Fangyu Cai | Editor: Michael Sarazen

Jewelgalore offers a delightful collection of silver rings for girls . Explore our range of beautifully crafted pieces, perfect for adding a touch of elegance and charm to young girls’ fashion.
Elevate your expertise with Osh University’s post graduate courses . Engage in meaningful research and advanced coursework, guided by a distinguished faculty. Osh is your gateway to a future of innovation and leadership in your chosen discipline.
Shalamar Hospital’s ENT department is dedicated to your ear, nose, and throat well-being. Count on us for comprehensive ENT services in a caring medical environment.
Join the ranks of excellence at Osh University, consistently ranked among the international medical universities. Experience a cutting-edge education that propels you toward a successful medical career on a global stage.
Stay warm and stylish with Tempo Garments thermal clothing mens. Crafted with premium materials, our collection offers superior insulation to combat chilly weather. Whether for outdoor adventures or daily wear, trust Tempo Garments to keep you cozy and fashionable all winter long.