Abstract: Pretrain-finetune has emerged as a powerful learning paradigm, achieving remarkable accuracy gains in various domains. However, its substantial computational requirements limit its application to broader areas. To address this challenge, we develop MixTraining, a novel training framework that---for the first time---incorporates asynchronous computation into the standard pretrain-finetune paradigm. At a high level, our MixTraining...