Microsoft & OpenAI’s µTransfer Zero-Shot Hyperparameter Transfer Method Tunes GPT-3’s Hyperparameters on a Single GPU
In the new paper Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer, Microsoft and OpenAI researchers propose µTransfer, a method that leverages Maximal Update Parametrization (µP) to zero-shot transfer hyperparameters from small models and obtain near-optimal parameters on large models without directly tuning them.