The ever-increasing complexity of industrial-scale machine learning models has stimulated research into automatic hyperparameter tuning methods to boost the efficiency and quality of machine learning applications. Although automatic hyperparameter tuning is now an important component of many data systems, the limited scalability of state-of-the-art methods has become a bottleneck.
To address this issue, a research team from Peking University, ETH Zürich and Kuaishou Technology has proposed Hyper-Tune, an efficient and robust distributed hyperparameter tuning framework that features system optimizations such as automatic resource allocation, asynchronous scheduling, and a multi-fidelity optimizer plug-in. In empirical evaluations, Hyper-Tune achieves state-of-the-art performance across a wide range of tuning tasks.
The team summarizes their main contributions as:
- We propose Hyper-Tune, an efficient distributed automatic hyperparameter tuning framework.
- We conduct extensive empirical evaluations on both publicly available benchmark datasets and a large-scale real-world dataset in production.
The proposed Hyper-Tune framework contains three core components: a resource allocator, an evaluation scheduler, and a generic optimizer.
To automatically determine the appropriate level of resource allocation and balance the “precision vs. cost” trade-off in partial evaluations, the researchers used a simple yet novel resource allocation method that searches for a good allocation via trial-and-error.
The evaluation scheduler meanwhile is designed to leverage parallel resources via D-ASHA — a novel variant of the ASHA (Asynchronous Successive Halving Algorithm) hyperparameter optimization algorithm introduced by Li at. al. in 2020 — to simultaneously satisfy synchronization efficiency and sample efficiency.
To create a flexible and convenient system architecture that supports the drop-in replacement of different optimizers under the async/synchronous parallel settings, the team employed a modular design that enables plugging in different hyperparameter tuning optimizers. They also adopted an algorithm-agnostic sampling framework to enable easy adaption of each optimizer algorithm to the asynchronous parallel scenarios.
In evaluation experiments on publicly available benchmark datasets and a large-scale real-world dataset, the proposed Hyper-Tune framework achieved strong anytime and converged performance and surpassed state-of-the-art methods on hyperparameter tuning scenarios that included XGBoost with nine hyperparameters, ResNet with six hyperparameters, LSTM with nine hyperparameters, and neural architectures with six hyperparameters. Hyper-Tune also achieved up to 11.2x and 5.1x speedups compared to the state-of-the-art methods BOHB and A-BOHB, respectively.
The paper Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.