Google & Stanford U’s DoReMi Significantly Speeds Up Language Model Pretraining
In the new paper DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining, a research team from Google and Stanford University introduces Domain Reweighting with Minimax Optimization (DoReMi), a domain weight optimization strategy that leverages distributionally robust optimization (DRO) to substantially speed up effective language model pretraining.