Snow Lake AI: China’s Largest Photo App’s New FPGA Solution Provider
This summer Chinese beauty cam and photo editor specialist Meitu Inc. announced an FPGA-accelerated neural network computing project with young startup Snow Lake AI. The collaboration aims to boost computing speed on Meitu mobile phones by 30 times.
Meitu’s data center responds to up to 210 million photos processing requests each day, and the company has been seeking a computation solution provider that can guarantee millisecond-level low-latency processing. While GPUs are commonly used for such large-scale computing tasks, they are not a cost-effective solution. Meitu launched its Meitu Imaging Laboratory to conduct research on computing acceleration in response to the real-time demands of its one billion online users.
The biggest challenge is transplanting AI algorithms to cheaper chips. After four months of working on algorithm migration, Snow Lake AI ran their CNN algorithm on a ten-dollar ZYNQ 7020 chip. “The demo impressed the Meitu team very much, and they invested in us on the second day,” recalls Snow Lake CEO Qiang Zhang.
Snow Lake’s solution has begun replacing the Meitu Cloud Computing Center’s expensive and power-hungry GPU servers. Zhang told us “The cost of NVIDIA’s data center GPU averages about US$10,000, and their computing power is about 35T. The 1U computing power of the servers we’re upgrading can reach 400T, and cost only US$3,000.”
With the development of 5G, Snow Lake anticipates strong demand for cloud computing. To this end, setting up a cost-effective private cloud solution is very important. “We focus on the private and public cloud and the vehicle market. The module market priority has been lowered, because AI is still lacking application promotion in the C-end market,” says Zhang. In the cloud computing center market, Snow Lake focuses on two business tracks: replacing private cloud GPUs and using FPGAs to accelerate the public cloud.
CEO Zhang: Nailing Investors with Faith in FPGA
After receiving millions in Angel Round funding from Meitu, Snow Lake’s team of 20 is now actively preparing for a pre-A financing round. The company is connected to upstream FPGA chip manufacturers and downstream application vendors.
“Moore’s Law will slow the development of CPU computing power and we will need GPU, FPGA, ASIC and other heterogeneous chips to fill the gap,” says Zhang.
Zhang tells us that investors weren’t keen on FPGAs when he first began. In early 2017, ASIC architecture was the industry mainstream for AI chips, and most FPGA startups struggled in their early evaluation stages.
“We will break this stiff idea with a heterogeneous system which is based on a non-von Neumann architecture,” explains Zhang. A firm believer in heterogeneous systems, Zhang worked as a developer on several global FPGA research projects after graduating from Shanghai Jiaotong University. He also founded two startups in the field — both short-lived due to market constraints.
One notable new FPGA deployment is the National Cancer Institute’s FPGA-based gene project, a world-first effort that involves calculation and comparison of some three billion base pairs. Zhang’s team transplanted the Smith-Waterman algorithm onto Vertex2-6000 FPGA chips, finishing the task at 1/300th of the previous method’s cost while reducing total computation time from six months to five days.
Betting on New Utilization Rates and Workflows
Snow Lake focuses on providing FPGA-based heterogeneous computing solutions, focusing on algorithm reconstruction and optimization, and architecture optimization. For Zhang, hardware utilization rate is the most important area for improvement.
“We can raise chip utilization rate up to 98 percent. In other words, there are only two resting times in the 100 calculation cycles,” says Zhang. The 98 percent utilization rate means complex algorithms that previously required high-end Xilinx chips which cost US$10,000 can now be implemented on chips that sell for under US$20. “The core difference between the two chips is utilization rate.”
Snow Lake’s framework tool Ptero boosts AI algorithm development efficiency by three times, upgrades chip utilization rate to 98 percent, and can elevate GPU cloud servers’ cost performance by 30 times.
Says Zhang: “Ptero is the result of more than ten years of technical research. The tool is very easy to use, and based on our experience, a new grad will only need three months of training before becoming an FPGA algorithm engineer.” Moreover, Zhang says that engineers working in Snow Lake’s technical framework will no longer need to master all the processes involved in FPGA development, such as the laborious writing of algorithms and code.
Source: Synced China
Localization: Meghan Han | Editor: Michael Sarazen