Abstract: Machine translation (MT) is an important NLP task which lots of widely-used models are originally designed for. The fast-growing size of MT training corpus and the increasing number of model parameters make researchers pay more attention to the efficiency problems in training MT models. In this talk, we focus on two types of efficiency problems, namely, data efficiency and computational efficiency. Data efficiency means the ability to use limited parallel corpus to enhance the translation accuracy of the MT model. Computational efficiency means low required computational and storage resources when training an MT model with specific amounts of data and some certain training methods.
For data efficiency, this talk adopts the active learning framework which is to design an acquisition function to boost the performance of the MT models under a limited annotation budget. The active MT approach is combined with various data augmentation methods to further enhance MT models' data efficiency.
For computational efficiency, this talk focuses on reducing the storage usage when training neural machine translation models by proposing several reversible architectures. The reversible architectures are extended to specific forms to ease the storage problem when performing differentiable architecture search.