High-resolution simulations can provide the great visual quality demanded by today’s advanced computer graphics applications. However, as simulations scale up, they require increasingly costly memory to store physical states, which can be problematic, especially when running on GPUs with hard memory space limits.
Previous work on scaling up such simulations has mostly focused on improving computation performance, while approaches for improving memory efficiency have remained largely unexplored. In a recent paper, a research team from Taichi Graphics, MIT CSAIL, Zhejiang University, Tsinghua University and Kuaishou Technology introduces a programming language and compiler for quantized simulation that achieves both high performance and significantly reduced memory costs by enabling flexible and aggressive quantization.
The team summarizes their contributions as:
- A simple programming interface for quantized simulation that provides programmer bit-level control over numerical data types. The numerical data type interface is orthogonal to the actual computation, this allows the programmers to rapidly experiment with different quantization schemes.
- A compilation system that automatically generates efficient code for encoding/decoding quantized data types. The proposed system supports x64, ARM64, CUDA, and Apple Metal backends.
- A suite of domain-specific compiler optimizations further improves the memory performance of compiled quantized computation. These optimizations bring 4.10× performance improvement on the microbenchmarks and up to 1.58× on the large-scale GPU simulators.
- Systematic evaluations of the proposed system. The team demonstrates that their system pushes the resolution of physical simulations to unprecedented resolutions. Under proper quantization, they achieve 8× higher memory efficiency on each Game of Life cell, 1.57× on each Eulerian fluid simulation voxel, and 1.7× on each material point method [Stomakhin et al. 2013] particle. To the best of their knowledge, this is the first time these high-resolution simulations can run on a single GPU. The proposed system achieves resolution, performance, accuracy, and visual quality simultaneously.
While most existing work on high-resolution simulation has been based on manual low-level performance engineering using C++ and CUDA, the proposed system is built on top of Taichi [Hu et al. 2019], a data-oriented programming language designed for simulation applications. Although the researchers focus on consumer-level computers with a single GPU for simplicity, their proposed techniques can also be applied to multi-GPU and multi-node settings.
The overarching goal of the new study is to push the limit of simulation resolutions by alleviating associated memory space constraints. To this end, and inspired by the idea that many simulations do not need standard full-precision IEEE 754 data types, the team leverages low-precision data types in simulations to save both memory space and bandwidth.
Unfortunately, manually coding programs that operate on low-precision and quantized data types is extremely laborious, and requires repeated trial-and-error processes to determine the right quantization scheme. The ability to flexibly switch between different quantization schemes is thus vitally important when developing efficient quantized simulators.
To address this issue, the proposed programming interface for quantized simulation provides programmer bit-level control over numerical data types. This approach enables programmers to specify customized and quantized data types for physical state storage, greatly simplifying the development of quantized simulators while also reducing memory bandwidth consumption.
The team introduces a compilation system that automatically generates efficient code for encoding and decoding quantized data types and a novel hierarchical data structure composition system, and designs several schemas to enable efficiently decoding and encoding of real numbers.
The team also provides a suite of domain-specific compiler optimizations that further improve memory performance of the compiled quantized computation via three novel optimizations: 1) Bit struct store fusion, 2) Thread safety inference, and 3) Bit array vectorization. The first two are designed to improve performance on a broad range of simulation workloads, while the third can provide significant performance improvements on computations using 1-bit data types.
To evaluate the performance and accuracy of the proposed system under memory space constraints, the researchers conducted experiments on three applications: Game of Life, Eulerian fluid simulation, and the Moving Least Squares Material Point Method (MLS-MPM).
In the experiments, the results for the proposed 3D quantized simulation were indistinguishable from full-precision results. The study also shows that by modifying no more than three percent of the simulator code, a developer can quantize an MLS-MPM or Eulerian fluid simulator running at a comparable speed to the full-precision version, demonstrating the proposed system’s effectiveness and ease of use.
The code is available on the project GitHub. The paper QuanTaichi: A Compiler for Quantized Simulations is a SIGGRAPH 2021 conference paper and is available on yuanming.taichi.graphics.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.