Amazon’s Sockeye 3: Neural Machine Translation With PyTorch That Is 126% Faster on GPUs

Synced

4 years ago

Anyone who regularly uses machine translation systems will have noticed huge performance improvements over the last few years, attributable to neural network-based models that have largely replaced the previous generation of phrase-based systems.

Introduced in 2018, Sockeye is an open-source framework that offers fast and reliable PyTorch implementation for neural machine translation (NMT) and has been powering Amazon Translate and other NMT applications. Sockeye 2 was released in 2020.

In the new paper Sockeye 3: Fast Neural Machine Translation with PyTorch, an Amazon team presents the latest version of the Sockeye toolkit for efficient training of stronger and faster models. Sockeye 3 achieves speeds up to 126 percent faster than other PyTorch implementations on GPUs and up to 292 percent faster on CPUs.

Sockeye 3 optimizes a distributed mixed precision training strategy to yield faster calculations and speedups by fitting larger batches into memory. Moreover, it can scale to any number of GPUs and any size of training data by launching separate training processes that use PyTorch’s distributed data parallelism to synchronize updates.

For inference design, Sockeye 3 uses static computation graphs to minimize the impacts of dynamic shapes and data-dependent control flow, enabling it to trace various model components via PyTorch’s JIT compiler.

The developers also maintain backward compatibility with Sockeye 2 MXNet models — all models that were trained with Sockeye 2 can be converted to models running on Sockeye 3 with PyTorch.

Sockeye 3 also introduces many new advanced features: It supports replacing the decoder’s self-attention layers with Simpler Simple Recurrent Units (SSRUs) and fine-tuning with parameter freezing, and enables users to specify arbitrary prefixes (sequences of tokens) on both the source and target sides for any input.

In their empirical studies, the team compared Sockeye with benchmark NMT toolkits that included Fairseq (Ott et al., 2019) and OpenNMT (Klein et al., 2017).

In the evaluations, Sockeye 3 achieved comparable or better performance on GPUs and CPUs: delivering a 15 percent improvement for batched GPU inference, +126 percent for non-batched GPU inference, and +292 percent for CPU inference.

Overall, Sockeye 3 provides much faster model implementations and more advanced features for NMT. As with previous versions, It has been open-sourced under an Apache 2.0 license, and the Amazon team welcomes pull requests from community members.

The code is available on the project’s GitHub. The paper Sockeye 3: Fast Neural Machine Translation with PyTorch is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

Share this: