Is it possible to single-handedly curate a meta-study covering 300 published sign language recognition papers with over 400 experimental results to source and read? For industrious Microsoft Applied Scientist Oscar Koller, the answer is a definite “yes.” In a newly published paper, Koller composes a quantitative survey of the SOTA in sign language recognition (SLR), making all source data public in the hope of advancing future work and new open questions.
Sign language is an entirely visual language with a unique grammar that differs from spoken languages. It is the first language used by most deaf people to communicate with each other and with hearing people who can understand the language. There are hundreds of different varieties of sign language, and different parts of the body are involved when signing. Although hands are the essential component, facial expressions and body movements can provide additional clues for better understanding. Researchers have mostly relied on two main approaches to SLR — extracting features either through direct measuring devices or through analysis of video sequences using computer vision algorithms.
Koller sorts the sign language parameters and extracted features into two types: manual parameters, which include hand shape, movement, location and orientation; and non-manual parameters such as head, mouth and eye movement, as well as eye blinks, brows and gaze. Koller also considers features that provide a global view of the signers via full-frame RGB images covering the full signer and body joints.
Koller notes that the number of published studies and available datasets have both increased since the earliest known work in the field appeared in 1983, and includes the 300 most relevant papers published between then and now. He identifies the year 2015 as a critical turning point, when the number of continuous studies using large vocabularies saw a sharp increase. Koller suggests the introduction that year of the SLR benchmark corpus RWTH-PHOENIX-Weather dataset might have contributed to the change. The 2018 release of the CSL (Chinese Sign Language) corpus with 178 signs also helped increase published results of continuous studies with large vocabulary tasks. Koller also notes that the RWTH-PHOENIX-Weather dataset remains the world’s only resource for large vocabulary continuous SLR benchmarking.
The year 2015 also marked the beginning of a trend which saw hand shape attracting more research attention, and studies with larger modelled vocabularies gradually shifting focus from manual to global features. Koller attributes this to two main factors:
- The availability of body joints and full depth image features with the release of the Kinect in 2010.
- The shift towards deep learning and the trend to input full frames instead of manual feature engineering.
Through his analysis of over 25 studies that have leveraged the RWTH-PHOENIX-Weather dataset for examining SLR methods, Koller concludes that research using an iterative training style ranks among the best-performing approaches. Koller is also first author of the 2015 paper Continuous Sign Language Recognition: Towards Large Vocabulary Statistical Recognition Systems Handling Multiple Signers, which introduced the RWTH-PHOENIX-Weather dataset.
The paper Quantitative Survey of the State of the Art in Sign Language Recognition is on arXiv.
Reporter: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.