From automated strike zones and advanced analytics to personalized training regimens, AI continues to transform the wide world of sports. Computer vision systems based on deep learning now play a significant role in sports analysis, where they are applied to tasks such as tracking players and sports equipment, human posture prediction, and detection of game-related actions.
For most sports analysis, it is vital that the model obtain real-time game performance information as smoothly and quickly as possible. The speed and accuracy of video analysis can generally be increased by replacing manual data collection with automated systems.
A team of researchers from Russian AI startup OSAI recently introduced the real-time neural network TTNet, designed for processing high-resolution table tennis videos with both temporal (event spotting) and spatial (ball detection and semantic segmentation) data. This method provides core information that can be used by various analytical and referee systems.
Table tennis is a fast-paced game with a variety of visual data that can be analyzed. Challenges in the table tennis space include high game ball speed, low pixels on the ball in full-view game video, detecting the ball against similarly coloured backgrounds, etc. Also, a high video frame rate is required to accurately detect ball trajectory and other behaviour.
TTNet is a lightweight multi-task architecture for extracting real-time data from table tennis videos. It is suitable for reduced full HD video and is capable of detecting game balls with pixel-level accuracy. It can spot in-game events and predict semantic segmentation masks in 120 fps video on a single consumer-grade GPU.
For model evaluation, the researchers built (and open-sourced) the OpenTTGames dataset, focused on spotting quick game events. The dataset comprises annotated full-HD videos of table tennis games recorded at 120 fps with an industrial camera. Labels were added manually and using deep learning, and include frame numbers and corresponding targets for a particular frame; in-game events such as ball bounces, net hits, or empty event targets; and ball coordinates and segmentation masks.
The network consumes an input tensor from the source video and outputs a semantic mask of the game table, the position of the ball, the probability of events in a game, etc. Due to the low game ball resolution issue, the ball position at this stage is a rough estimation. The next ball detection stage optimizes frames from the original input to enable the system to perform final ball detection with the spatial resolution of the original video.
In tests, the TTNet model scored an impressive 97.0 percent on game event accuracy with 2-pixel RMSE (root-mean-square deviation) in ball detection. The OSAI team is working with the Russian Table Tennis Championship, and hope the research can contribute to the development of additional deep learning based approaches for sports analysis.
The paper TTNet: Real-time Temporal And Spatial Video Analysis Of Table Tennis is on arXiv.
Author: Xuehan Wang | Editor: Michael Sarazen