Unless you know precisely how Shanghai’s 364 subway stations align with the metropolis above them, navigating the subterranean tangle of China’s largest city can leave you frustrated and hopelessly lost.
On December 5th Alibaba, Ant Financial, and Shanghai Shentong Metro Group jointly launched a new voice interaction system for purchasing subway tickets. Riders can now tell a machine their destination — for example, Zhongshan Park — and the system will use the AutoNavi (高德地图) cloud map to issue a ticket for the nearest station.
Alibaba CEO Jack Ma was among the first to try out the new system, which is expected to be available on all Shanghai’s ticketing machines next year as part of China’s project to build smart cities’ IoT infrastructure.
Current voice dialogue systems such as smart speakers and voice assistants require “trigger words.” The iPhone’s voice assistant, for example, activates when it hears “Hey Siri.” Alibaba’s new system uses multi-model interaction. Zhijie Yan, head of the Alibaba voice team, says the goal is to eliminate trigger words altogether. “You just need to approach the machine and it will interact with you naturally.”
Says Yan, “Real life environments are most likely noisy, and that remains to be the biggest technical challenge.” Voice recognition is difficult in open, noisy environments, which is exactly what Shanghai subway is, especially during rush hours. Alibaba’s new ticketing system uses computer vision to identify speaker’s lip movements and measure the distance between speaker and machine before finalizing its voice input. Visual signals are combined with audio signals captured by a large microphone array, with a supporting software signal processor suppressing noise and interference.
Voice Ticketing Machine: “Recommended stop is Lujiazui, 285 meters from your destination.”
Voice Ticketing Machine: “Order changed to one ticket.”
Last summer Yan led a five person team on the subway project, identifying stability and rapid learning ability as further challenges to meet, as public service facilities like the subway need to function smoothly 24/7.
The Shanghai subway is also introducing Alibaba’s facial recognition and Alipay for digital payment at subway entrances.
This is just the first step: airports, train stations, event spaces, restaurants and shopping malls will soon be able to use multi-model technology to open new human-machine interaction possibilities in information inquiry, interactive advertising, and direction navigation applications.
Journalist: Yi Wang, Meghan Han | Editor: Michael Sarazen

