For many people, “AI” conjures up sci-fi visions of autonomous robots — sometimes friendly, often hostile. For scientists and engineers meanwhile, AI is seen more concretely: as arrays of GPUs running complex multi-layer neural networks.
Both perspectives tend to overlook the fact that — just like with traditional manufacturing — today’s AI projects are being built to a large degree using old-fashioned manual labour.
Qin Jiao, 30, recently left her job in a call center to join the growing ranks of workers employed in AI-supporting “data processing” companies. Jiao’s task is manually labelling pictures, videos, and voice data.
The labeled data will be used by technology companies to train AI algorithms and models in areas of image recognition, speech recognition, and various other applications. Generally speaking, the larger the dataset of accurately labeled data, the better the performance and the product.
A skilled worker like Jiao can process up to 40 images per day, provided she need only frame objects, label categories, and provide some context. However, if images of a building for example include complex architecture details, a worker’s labelling output can fall to 10 pictures per day.
Figure 1 : Picture shows a simple way of labelling
Jiao’s latest order involves labelling 60,000 pictures within a week. Normally, delivering an order of that volume would require more than 200 people working for seven days, but Jiao’s team numbers fewer than 100, some of whom have other projects to deal with.
“I may as well die on the spot,” sighs Jiao, “faced with delivering what I’m asked to.”
Even a set of 60,000 images seems small when compared to ImageNet, which has 15 million annotated images. Due to its size and open source access, ImageNet quickly became the first choice for image recognition research. Major researchers like Andrew Ng and Jeff Dean have used it.
But ImageNet has its own weaknesses — large label boxes, few labelling types, and a high error rate — which make it undesirable for training algorithm models.
AI companies require accurate data for specialized applications, and seemingly little things such as labelling and tagging demand accuracy. At present, only humans are up for the task.
For many start-ups recruiting engineers, “the ability to collect and label data” is now a prerequisite. The ability to obtain high-quality labelled data plays a key role in determining a company’s competitiveness.
Before professional data labelling companies appeared, crowdsourcing platforms were a low-cost solution to the AI market’s increasing demand for data. A well-known crowdsourcing data platform claims to have over 5,000 data labelling specialists, capable of handling more than 2 million images per day. Service is guaranteed to “provide good quality.” But is that reliable?
With the sheer number of people participating in crowdsourcing, capability naturally varies. This variance creates errors, and when there are too many errors, a dataset simply can’t be used.
Purchasing labelled datasets through data exchange platforms is another option. But the problem of customization remains, as different AI applications require different datasets and labelling methods.
AI companies can’t label everything themselves, and crowdsource platforms cannot pass quality control. This is what gave birth to specialized data labelling companies like Jiao’s, which target market demand for large customized datasets with ensured quality.
“Ours is the only company in the region that specializes in refined data processing,” says Jiao with pride.
Employees are divided into different project groups. Some may be responsible for labelling nodes on the human body in complex yoga-like postures; while others label motorcycles, bicycles and pedestrians, and identify vehicle travel directions to train algorithm models for street surveillance cameras. Others may scan the contours of buildings or obscured objects, used to train radar to detect real-world objects.
Other employees are responsible for videos, extracting frames and labelling direction and changes for each object in the subsequent frames. This data may be used to train perception of movement, or predict the projected change of object locations.
Figure 2. Labelling tasks cannot be automated, especially when it comes to finding relevance.
Such datasets are crucial in the AI industry’s race to launch new products, improve the accuracy of specific functions and the reliability of products based on that function, and stimulate the company’s growth and financing.
The new data processing companies are located in a quiet, dedicated “digital town,” in the mountains about 50 kilometres from the city of Guiyang.
Figure 3. The quiet digital town has buildings with coloured steeples and well-trimmed green belts.
Near the town is a small vocation college, which trains most of the staff at Jiao’s company. The initial purpose of these vocational schools was to educate the underprivileged — most of the kids come from impoverished backgrounds and the schools provide them with subsidies and scholarships. Students can also work about six hours per day, the job becoming a springboard to help them go elsewhere.
The school corridors are filled with reminders of the students’ previous conditions — including poverty, lost parents and physical disabilities. These are paired with stories of how students were enabled to change their family fate.
“Our students are particularly hardworking and down to earth, some end up doing internships in Beijing,” says Jiao. “Employer feedbacks tells us that they outperform Beijing undergrads.”
The company’s COO Li Zheng also speaks positively regarding what these jobs mean to the local students. However, he is concerned about the industry’s future prospects. A former graduate from Beijing University of Aeronautics and Astronautics, Zheng knows that the current data labelling industry is very labour-intensive, and not that different from the manufacturing factories of southern China. The students are part of an assembly line, at the bottom of the value chain. He believes the company has to find a better business model to help students move forward in their career paths.
With that in mind, Zheng has extended his company’s business to cover data collection. This entails taking multiple-angle portrait photos of models in various postures. These pictures will train computers to recognize and process different perspectives, which is useful in facial recognition applications.
Figure 3. Facial expressions can be a dataset’s specialized label requirement
The company has already completed several client orders in their self-built photography studio. And, of course, students are used for the portraits. They line up outside the studio, take photos, then put on face masks, sunglasses, and hats for more pictures. One set of pictures comprises 10 images, and 100 sets are produced per day. For security companies used to taking their own photos, these packaged photo sets (with at least a thousand students) are indeed very competitive products on the market.
The company also does second-language translation work, which reduces the need for local interpreters when business travellers visit Southeast Asia for example.
“We are training a number of staff in the technology, so we can better deliver to our client’s requirements.” says Zheng.
“Labelling is really hard work,” says Chief Scientist of UniDT Technology Yi Xiangzhi, who cannot help but purse his lips when asked about the task.
The UniDT recently hosted a big data application competition where contestants were asked to build programs that could identify images of animals or help vehicle security cameras trace accidents.
One test asked contestants to “calculate the shelf occupancy rate of a product through pictures.” Participants took more than 1,600 real shelf pictures as raw data.
The purpose of the competition was to encourage the use of smaller datasets, preferably less than two thousand, to complete an image recognition task.
Figure 4. From the left: Shelf pictures; From the right: Each shade of gray label corresponds to a different commodity
“The biggest problem for deep learning is data sample size. The challenge lies in teaching the machine to learn with smaller datasets,” says Xiangzhi. Enhanced image modeling can then be used to reduce errors.
But creating accurate labels is no easy task, considering it takes 12 people half a month to process 1,000 pictures. The amount of incredibly monotonous work is enough to drive some labelling researchers mad. Even if there are enough people capable of labelling up to standards, is data labelling an industry with long-term sustainability?
Figure 5. Li Feifei, the founder of ImageNet, is also aware of the power of good labelling. She is currently carrying out a project called “Visual Genome”. The project has better representation of objects, attributes, and pairwise relationships. Currently, Visual Genome has over 108,000 images.
Regarding the value of data labelling, industry insiders seem to have diverging views. Some see the Internet as the largest dataset we have, even if that data cannot be used directly. Some say they need more student interns or staff to label data. Some companies have set up their own data-labelling departments, while others believe that outsourcing is no problem if the right channels are used.
In response to the question “Are there people in big companies responsible for labelling data?” on Zhihu.com (the Chinese version of Quora), such employees remarked that it was either “labour-intensive labelling warfare” or “work outsourced to cheaper places.” Employees from smaller companies meanwhile said labelling tasks were handled either by an “all women’s department” or “done ourselves.” Overall, “outsourcing” was the most common response.
Job recruitment websites tell a similar story. A search for “data labelling” returns up to 400 directly related posts, many from outsourcing companies.
With countless startups swarming to develop their own AI applications, starting a data labelling company seems a sound business plan.
In 2009, director Zhang Tonghe filmed migrant assembly line workers for her documentary “Working Girls”. Some had just turned 18, while others were even younger. Many didn’t seem to understand what their job was, and had no idea that the so-called “QC” they talked about stood for “quality control.” Living conditions were extremely difficult, and even heading out for late-night snacks was considered luxurious.
None of the girls planned to stay put, and many had clear career goals. To that end, they were willing to share rooms with 15 other people, and bathrooms with 50 others. Goals like getting a down payment for their own places, moving on to better positions, and taking weekend computer or English classes occupied their minds.
Apparently, these young women interpreted Marx’s theory of alienation in a new way: it wasn’t about deriving a personal sense of achievement at work. Instead, they focused on how life moves forward, how their hard-earned money was budgeted, the skills acquired along the way, and who they hoped they could eventually become.
“Labelled data is in high demand”, says Xiangzhi, “but the industry has only five years left, because everyone is looking for solutions such as open-sourcing or small sample learning.”
Perhaps, before we worry about the industry’s future prospects, we could consider whether the value it is now bringing to the people doing the work is sufficient in itself.
Yes, AI is labour-intensive, just like other large-scale collective social efforts. And the people doing data-labelling are not unlike other hard-working manual labourers — they are the faces we find in any crowd.
Original Articles from Synced China: http://www.jiqizhixin.com/article/2594| Authors: Miaomiao Yu| Editors: Meghan Han, Michael Sarazen