Breadcrumb

MixTraining: Leveraging Asynchronous Computation in the Pretrain-Finetune Paradigm

By Yinglun Zhu |

Abstract: Pretrain-finetune has emerged as a powerful learning paradigm, achieving remarkable accuracy gains in various domains. However, its substantial computational requirements limit its application to broader areas. To address this challenge, we develop MixTraining, a novel training framework that---for the first time---incorporates asynchronous computation into the standard pretrain-finetune paradigm. At a high level, our MixTraining framework merges several pretraining epochs and finetuning epochs into a new mixtraining phase, featuring a smooth transition between two objectives. Extensive evaluations show that MixTraining not only achieves substantial computation gains, e.g., 1.6x speedup for ViT-T training, but also achieves non-trivial accuracy gains, e.g., 8.9\% accuracy gains on the TinyImageNet dataset. Additionally, our MixTraining framework can be easily adapted to finetune pretrained language models, achieving accuracy gains in 7 out of 8 experiments on standard language benchmarks.


 

Let us help you with your search