IBM today announced it will release the world’s largest facial attribute dataset in order to fight bias in artificial intelligence systems used to recognize human faces. The dataset was built by IBM research scientists and contains one million images, five times the image count of the current largest facial attribute dataset. It will be publically available this fall.
Although AI has sparked many technological breakthroughs, public concern has developed regarding bias, particularly in tasks related to race. A study by MIT and Microsoft researchers released earlier this year found that while Microsoft, IBM and Megavii facial recognition tech performs remarkably well at identifying light-skinned male subjects (99.6 percent average accuracy), it struggles to correctly recognize dark-skinned female subjects. IBM’s system achieved only 65.3 percent accuracy.
Today’s effective AI systems train on large-scale annotated datasets, and it’s believed a lack of race and skin colour diversity in facial image datasets can contribute to bias in AI applications/products.
IBM’s new dataset is designed to address the lack of diversity. The dataset can also match attributes (hair color, facial hair, etc) to an individual’s identity, a cross-referencing capability unavailable in current datasets.
IBM will also release an evaluation dataset which includes 36,000 facial images equally distributed across all ethnicities, genders, and ages.
Other tech giants with world-class research institutes are also striving to reduce cross-demographic accuracy differences in their products. Yesterday, Microsoft announced an improvement to its facial recognition techniques which reduces error rates by up to 20 times for men and women with darker skin, and nine times for all women.
IBM will hold a facial recognition model competition this September using its new facial image dataset. Results will be announced at a technical workshop hosted by IBM and University of Maryland at this year’s European Conference On Computer Vision (ECCV) on Sept. 14.
Journalist: Tony Peng | Editor: Michael Sarazen
Pingback: AI Loves Open-Sourced Data: What's New in January | Synced