It’s been a busy 24 hours for Facebook, which in a blog post this afternoon announced a trio of new AI-related application-specific hardware designs.
Zion is the company’s next-generation, large-memory unified training platform; Kings Canyon is an integrated circuit optimized for AI inference; and Mount Shasta is a specialized ASIC for video transcoding.
Facebook has added the new infrastructure solutions to the Open Compute Project, a collaborative community the company formed in 2010 devoted to redesigning hardware technology.
New AI Hardware Trio
Facebook is raising the stakes on AI hardware development. In January at CES 2019, the social media giant revealed it is partnering with Intel to develop an inference chip, the NNP-I. Intel has also provided Facebook with a large number of Xeon server chips to consolidate its infrastructure.
Today’s Facebook releases aim to beef up the company’s open-source hardware contributions to benefit the greater AI community. Over the years, Facebook has shared its crafted hardware designs such as Big Sur and Big Basin. The new AI hardware trio announced today follows on this commitment.
Zion is a large-memory unified training platform designed to handle various neural networks including CNN, LSTM, and SparseNN. The Zion system comprises an 8-socket server, an 8-accelerator platform, and OCP accelerator module. To accommodate both high memory bandwidth and capacity, Facebook employed a coherent high-speed fabric that connects all CPUs, and a second high-speed fabric to connect accelerators. By using the top-of-rack (TOP) network switch, users can scale to multiple servers within a single rack.
The inference server solution Kings Canyon can be divided into four parts: Kings Canyon inference M.2 modules, Twin Lakes single-socket server, Glacier Point v2 Carrier card, and Yosemite v2 Chassis.
Each Twin Lakes server contains M.2 Kings Canyon accelerators and a Glacier Point v2 carrier card. The sets are packed into a Yosemite v2 sled and connect to the TOR switch via a multi-host NIC. PCIe has been added to the server for higher network bandwidth and better communication.
Facebook revealed that it is collaborating with multiple partners to develop inference ASICs, which are designed to offer INT8 for best performance and support mixed-precision FP16.
Mount Shasta is an ASIC Facebook designed with partners Broadcom and Verisilicon that is optimized for transcoding workloads. Transcoding is the process of generating multiple output qualities and resolutions to optimize videos for a viewer’s available internet connection.
The Mount Shasta ASICs are deployed within Facebook data centers and share a similar infrastructure with Kings Canyon. A key feature of the design is that each of the software algorithms has been replaced with dedicated silicon within the chip. Facebook expects the video accelerators to be many times more efficient than its current servers.
AI chip race heats up
With the exponentially growing demand for data computation, tech giants are moving to reduce their dependence on third-party chip makers. Over the past year, major players including Google, Amazon, Microsoft, Huawei, Baidu, and Alibaba have all committed to designing their own AI chips to empower their cloud services and applications.
Google has rolled out the third generation of its Tensor Processing Unit, a custom application-specific integrated circuit (ASIC) that powers neural network computations for Google services such as Search, Street View, Google Photos and Google Translate, and runs inferencing for AlphaGo, the Google AI masterpiece that beat human champions in the ancient Chinese board game Go. TPU 3.0 can achieve up to 100 petaflops performance.
To complement Cloud TPUs deployed in data centres, Google also released a cut-down Google ASIC named Edge TPU last year, which will be embedded into gateways that bridge the Google Cloud Platform and devices such as sensors.
Amazon last year introduced an ARM-based CPU to provide compute for Amazon Web Services (AWS), Graviton. The new processor features Arm cores and makes extensive use of custom-built silicon. The company also announced “Inferentia,” a machine learning inference chip designed to deliver high performance at low cost. Inferentia will be available to customers late this year.
Microsoft has not released any plans for home-grown hardware. The Seattle tech giant has however teamed up with Xilinx, whose FPGA chips account for half of the co-processors currently used on Microsoft Azure servers to accelerate machine learning processing. Microsoft is also seeking chip experts dedicated to AI chip development for Azure.
Huawei last year unveiled Ascend 910 and Ascend 310, two 7nm-based AI chip IPs that run on the cloud for training and inferencing. Both chips are built on Huawei’s self-developed Da Vinci architecture, which features scalable memory, compute, and on-chip interconnection.
Baidu released China’s first-ever edge-to-cloud chip Kunlun (昆仑), which includes the training chip “818–300” and the inference chip “818–100” and can be applied to both cloud and edge scenarios, including data centers, public clouds, and autonomous vehicles. Composed of thousands of small cores with 14nm Samsung engineering and 512 GB/second memory bandwidth, Kunlun delivers the performance of 260 TOPS while consuming 100 Watts of power.
Alibaba announced last year that it is developing a new neural network chip called Ali-NPU for AI inferencing in the field of image processing, machine learning, etc. While Alibaba has not released any detailed parameters or performance, the corporate claimed this chip’s performance will be 10 times better than mainstream CPU and GPU architecture AI chips currently on the market, with only half the manufacturing cost and power consumption.
Journalist: Tony Peng, Fangyu Cai | Editor: Michael Sarazen