Deep visual recognition models have seen substantial performance improvements in recent years but are still typically trained for only one specific task, such as segmentation, classification, etc. Although these models often have the same core architectural backbone, there is no existing method for easily combining multiple task-specific models into one that can handle multiple tasks.
In the new paper ZipIt! Merging Models from Different Tasks Without Training, a Georgia Tech research team proposes ZipIt!, a general method that exploits redundant features to combine two or more models with the same architecture but trained on different tasks into one multi-task model without additional training.
The researchers focus on classification tasks, specifically either disjoint category splits of the same data or classification on different datasets entirely. They target a difficult setting: merging models that have different initializations and were trained on different tasks. Their proposed layer-wise approach combines each layer of one model with the corresponding layer of a second model while modifying both instead of permuting only one of the models (which is the approach used by existing permutation-based merging methods).
The team’s approach starts with disjoint layers with weights from models trained on different tasks. It matches redundant features to obtain a merge matrix that combines the matched pairs’ activations into a shared feature space and a corresponding unmerge matrix that can undo the operation. This unmerge matrix is propagated forward along the network while receiving an unmerge matrix from the previous layer in order to align the input space of the next layer. Given a merge matrix for the output and an unmerge matrix for the input, the layers are then “zipped” together to produce a single layer with a shared input and output space. The process is repeated to merge subsequent layers.
As some features can be unique to one model, the team’s approach also enables the merging of such features within each model via the “zip” operation. It further allows for partial zipping of models up until a specified layer to produce a multi-head model.
In their empirical study, the team evaluated ZipIt! performance by merging models trained on completely disjoint splits of the same datasets or trained on different datasets. In the experiments, Ziplt! achieved 20-60 percent improvements over the baselines.
The team hopes the demonstrated ability of ZipIt! to merge different models trained on disjoint tasks will encourage additional research with regard to practical applications of the method.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.