Application-specific hardware accelerators have become a new efficiency-boosting paradigm in the machine learning research community and in industry. Designing and optimizing these hardware accelerators however requires considerable manual effort as well as time- and energy-consuming simulations. Moreover, such simulation-driven approaches must be re-run from scratch whenever the set of target applications or design constraints change.
To reduce the costs of simulation-driven approaches, a team from Google Research and UC Berkeley has proposed PRIME, an offline data-driven method that utilizes logged simulation data to automatically architect hardware accelerators without the use of simulations. Compared to state-of-the-art simulation-driven methods on single and multiple applications, PRIME achieves impressive 1.54× and 1.20× performance improvements, while significantly reducing the required total simulation time by 93 percent and 99 percent, respectively.
The team summarises the benefits of their data-driven approach to hardware accelerator design as:
- It significantly shortens the recurring cost of running large-scale simulation sweeps.
- It alleviates the need to explicitly bake-in domain knowledge or search space pruning.
- It enables data re-use by empowering the designer to optimize accelerators for new unseen applications, by the virtue of effective generalization.
PRIME is a data-driven approach that can automatically architect high-performing application-specific accelerators. It does this using only previously collected offline data, learning a robust surrogate model of the task objective function from an existing offline dataset. PRIME can also be used for multi-model and zero-shot optimization, capabilities lacking in existing approaches.
The researchers compared PRIME with three state-of-the-art simulator-driven methods: 1) evolutionary search with the firefly optimizer; 2) Bayesian optimization implemented via the Google Vizier framework; and 3) a state-of-the-art online MBO method for designing biological sequences.
In empirical evaluations, PRIME outperformed the best designs observed in the logged data by 2.46× and improved on the best simulator-driven approach by about 1.54×. In the more challenging zero-shot learning setting, PRIME outperformed simulator-driven methods by 1.2× while reducing the total simulation time by 99 percent.
Overall, PRIME’s performance demonstrates the promise and potential of utilizing logged offline data and strong offline methods in a hardware acceleration design pipeline. The researchers suggest future studies could leverage PRIME for other problems in architecture and systems, such as software-hardware co-optimization.
The paper Data-Driven Offline Optimization for Architecting Hardware Accelerators is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.