Facebook researchers have introduced two new methods for pretraining cross-lingual language models (XLMs). The unsupervised method uses monolingual data, while the supervised version leverages parallel data with a new cross-lingual language model. The research aims at building an efficient cross-lingual encoder for sentences in different languages within the same embedded space — a shared-coding-space approach that provides advantages for tasks such as machine translation.
Research results show advanced efficiency in various cross-language comprehension tasks and state-of-the-art results on cross-lingual classification, unsupervised and supervised machine translation.
The Facebook XLM project contains code for:
- Language model pretraining:
- Causal Language Model (CLM) – monolingual
- Masked Language Model (MLM) – monolingual
- Translation Language Model (TLM) – cross-lingual
- Supervised / Unsupervised MT training:
- Denoising auto-encoder
- Parallel data training
- Online back-translation
- XNLI fine-tuning
- GLUE fine-tuning
XLM also supports multi-GPU and multi-node training.
Generating cross-lingual sentence representations
The project provides sample code that can quickly obtain cross-language sentence representations from pretrained models. These cross-lingual sentence representations are useful for machine translation, calculating sentence similarities, or implementing cross-lingual language classifiers. The examples provided by the project are mainly written in Python 3, and require support from the Numpy, PyTorch, fastBPE, and Moses libraries.
To generate cross-language sentence representations, the first step is to import code files and libraries and load the pre-training model:
Next, build a dictionary, update parameters, and build a model:
The following is a list of cases in BPE format (based on the fastBPE library), where researchers extracted sentence representations based on the pretraining model:
The last step is creating a batch and completing forward propagation to produce the final sentence embedding vector: