ACL 2021 Best Paper: Finding the Optimal Vocabulary for Machine Translation via an Optimal Transport Approach

A research team from ByteDance AI Lab, University of Wisconsin–Madison and Nanjing University wins the ACL 2021 best paper award. Their proposed Vocabulary Learning via Optimal Transport (VOLT) approach leverages optimal transport to automatically find an optimal vocabulary without trial training.

by Synced

2021-07-07

Comments 23

The ACL 2021 Paper Awards were announced this week, with the best paper honours going to a team from ByteDance AI Lab, University of Wisconsin–Madison and Nanjing University. Their paper treats vocabulary construction for machine translation, aka vocabularization, as an optimal transport (OT) problem, and proposes VOLT (Vocabulary Learning via Optimal Transport), a simple and efficient approach that works without trial training.

The performance of neural machine translation (NMT) systems is highly dependent on the choice of token vocabularies, and so it is crucial to identify a good vocabulary and find the optimal tokens — a process that typically involves intensive and laborious trial training.

In this paper, the researchers leverage optimal transport and propose VOLT as a novel way to automatically find the optimal vocabulary without trial training. The method achieves improved performance on widely-used vocabularies in diverse scenarios, including WMT-14 English-German and TED multilingual translation.

Most traditional NMT methods are built on word-level vocabularies, and although these models have achieved promising results, they fail when handling rare words under limited vocabulary sizes. Other advanced vocabularization approaches such as byte-level and character-level approaches can solve the rare words problem, but they also decrease token sparsity and increase the shared features between similar words. Even popular sub-word approaches, which achieve good results, may also result in high computation costs, as they only consider the frequency of a token while neglecting the size of the vocabulary.

To address these issues and take both entropy and vocabulary size into consideration, the team borrowed the economics concept of marginal utility, proposing the marginal utility of vocabularization (MUV) as the optimization objective. MUV evaluates the benefits (entropy) a corpus can get from an increase of cost (size), with the goal of maximizing MUV in tractable time complexity.

The team formulates vocabulary construction as a discrete optimization problem that aims to find the vocabulary with the highest MUV. Intuitively, vocabulary construction can be regarded as a process that transports chars (characters) into token candidates. Each transport matrix represents a vocabulary, and the transport matrix decides how many chars are transported to token candidates. Different transport methods bring different costs, and so the goal is to find a transport matrix that minimizes the transfer cost.

The team conducted experiments on three datasets — WMT-14 English-German translation, TED bilingual translation, and TED multilingual translation — and identify the main results as:

Vocabularies searched by VOLT are better than widely-used vocabularies on bilingual MT Settings.
Vocabularies searched by VOLT are on par with heuristically-searched vocabularies on low-resource datasets.
VOLT works well on multilingual MT settings.
VOLT is a green vocabularization solution.
A simple baseline with a VOLT-generated vocabulary achieves SOTA results.
VOLT beats SentencePiece and WordPiece.
VOLT works on various architectures.

Overall, the experiments validate VOLT’s ability to effectively find well-performing vocabularies across diverse settings.

The associated codes are available on the project GitHub. The paper Vocabulary Learning via Optimal Transport for Neural Machine Translation is on arXiv.

Author: Hecate He | Editor: Michael Sarazen, Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

23 comments on “ACL 2021 Best Paper: Finding the Optimal Vocabulary for Machine Translation via an Optimal Transport Approach”

Pingback: r/artificial - [R] ACL 2021 Best Paper: Finding the Optimal Vocabulary for Machine Translation via an Optimal Transport Approach - Cyber Bharat
Alex Maddyson

2021-07-14

First of all, I congratulate these universities, they behaved very dignifiedly, their work aroused great interest in me into neural translators. I myself, as an engineer who works with many projects at Engre.co, where I am engaged in machine learning and software development, and software testing. I can say that they are great fellows and I just want to congratulate them. I also want to write that their idea is very exceptional, I think that in the future it will be possible to implement it in many directions.

Loading...

Reply
- Tim
  
  2021-07-14
  
  Alex, I fully agree with your opinion. Also, many thanks for the link, I will take a look at it.
  
  Loading...
  
  Reply
ij.start.canon

2022-06-22

Canon printer or scanner by visiting ij.start.canon from your web browser. Visit the webpage from an updated browser to download and install the Canon printer drivers. You will only need the Canon printer model name and the type of your operating system to finalize the setup process. Follow these steps to set up ij.start.canon printer on any Windows or Mac device.

Loading...

Reply
cricut.com/setup

2022-07-01

Did you just buy a new Cricut machine? Get started with cricut.com/setup and register your account instantly. If you are just exploring how the Cricut machine works, you should know you have got some magic to see. With the best Cricut machines, you have endless possibilities of what you can make with them. Home crafters who are constantly worried about creating enough space for their projects can now give their worries a rest. With compact Cricut machines, you have more space for your projects and lesser machine space.

Loading...

Reply
Samuel Radnor

2022-07-02

Cricut machine cuts the shapes out of leather, balsa wood, fabric, cardstock, and a large number of materials. The process of setting up a Cricut machine through cricut.com/setup is easy. You can reach Cricut’s official site and download Design Space. After installing it on your PC, you can get started with the process of making cards, crafts, and party decorations.

Loading...

Reply
stangar field

2022-07-04

Protecting a device digitally is essential as hackers can target your device anytime. One can carry out the downloading and installation process via mcafee.com/activate. It has an excellent password manager that secures the credentials for several sites that you visit. The VPN hides the real IP address to protect a device user’s identity. Get the advanced protection features of the McAfee antivirus program for your desktop and laptop right now.
mcafee.com/activate!

Loading...

Reply
leo smith

2022-07-04

Ij.start.canon is the official site of Canon for its InkJet scanners and printers. On this website, you can learn to set up your PIXMA, MAXIFY, imagePROGRAF, and CanoScan printers and scanners. Also, learn to connect your Canon printer to your PC, laptop, smartphone, or tablet.
ij.start.canon!

Loading...

Reply
Cricut.com/setup

2022-07-25

Cricut Maker has earned a good name for being an excellent craft machine. The Cricut machine does a superb job of cutting a wide range of materials with precision and turns those materials into fantastic crafts. If you have recently bought a Cricut Maker and are not aware of the methods to set up your Cricut machine, you can reach the official site cricut.com/setup and quickly perform the whole setup procedure.

Loading...

Reply
cricut.com/setup

2022-08-18

Making beautiful crafts for your home and office is now as easy as ABC if you have a Cricut machine. Set up your own now at cricut.com/setup. If one buys a Cricut machine, one should be aware of the process of setting it up. The official website http://www.cricut.com/setup makes it easy for users to set up their Cricut machine. If you have recently purchased a Cricut machine and don’t know how to go through the setup process through cricut.com/setup, following the step-by-step instructions through the site cricut.com/setup will help you immensely.

Loading...

Reply
ij.start.canon setup

2022-12-23

Canon is one of the most famous printer manufacturers. The printers it produces are known for their solid build and high-quality output. So if you have one of them lying at your disposal and don’t know how to set it up. Then, don’t worry; we are here to help. Start by visiting Ij.start.canon. On the website, follow the on-screen instructions. That should install the printer driver on your computer.

Loading...

Reply
cricut.com/setup

2023-05-09

The Cricut machine is a powerful cutting machine that makes your everyday cutting effortless. This machine is combined with effective rotary blades that help you cut hundreds of materials into the desired shape and size. You can use the Cricut machine to make stuff for home decor. You can cut many materials with leather, vinyl, iron-on, cardstock, etc. To use the machine, you should visit cricut.com/setup and download and install Design Space.

Loading...

Reply
cricut.com/setup

2023-05-10

Do you want a companion for your everyday crafting? Cricut is the machine that helps you prepare your regular crafts hassle-free. The machine consists of sharp rotary blades that are meant to cut your materials into desired shapes. You can cut many materials, including leather, cardstock, vinyl, etc. People love this machine for its ease of set-up and use. To set up the machine, go to cricut.com/setup and download and install Cricut Design Space.

Loading...

Reply
ij.start.canon

2023-06-12

Need help with how to set up a Canon printer on Mac? Click on ij.start.canon, you will get both wired and wireless setup guides for your operating systems, such as a computer, laptop, smartphone, or tablet. Before setting up a Canon printer on your device, you need to download and install compatible driver software from ij.start.canon on your operating system. Then, get started following these basic step-by-step instructions in the section below:

Loading...

Reply
cricut.com setup mac

2023-06-13

Cricut is a smart die-cutting machine that lets you cut and craft various materials, such as paper, cardstock, vinyl, iron-on and more. Some Cricut machines can cut thicker materials, such as wood and leather. However, all the Cricut machines are compatible with the Design Space app. So if you already own a Cricut machine and want to upgrade, you won’t face any compatibility issues.

Loading...

Reply
Cricut Design Space

2023-07-19

The Cricut machine connects to your desktop or smartphone using either a USB cable or wireless Bluetooth. With the Design Space app, we can create or download our own designs and edit them before cutting. The Cricut machine operates by loading the material and cutting it with a small blade.

Loading...

Reply
cricut.com/setup

2023-07-21

Cricut is a Amazing Smart Cutting Machine that work with an Cricut Design Space App for help you to Design and Craft like Sticker.

Visit: https://designspaceformac.com/

Loading...

Reply
cricut.com/setup

2023-07-27

Cricut Explore machines are a perfect companion for a crafter looking for simplicity at less price than the Maker series. Both Explore Air 2, and Explore 3 are great machines to have, and they can cut all the popular materials such as vinyl, iron-on, paper, cardstocks and 100 more. So, if you are looking for a cutting machine that does more than Cricut Joy and is not as costly as Maker, you should buy the Explore model. Also, you have to download the Design Space app from the cricut.com/setup.

Loading...

Reply
cricut.com/setup mac

2023-08-01

Unleash your creative spirit with a Cricut machine, the perfect tool to bring your artistic ideas to life. With three versatile Cricut machines capable of cutting a variety of materials, crafting possibilities are endless. Before diving into your DIY projects, make sure to complete the setup procedure by visiting cricut.com/setup and begin on a journey of crafting.

Loading...

Reply
Xavier Henry

2023-08-01

Hey there, I’m Xavier. I’m a Technical Consultant living in Jacksonville Florida. I am a fan of DIY, technology, and design. design.cricut.com

Loading...

Reply
cricut explore

2023-08-05

Cricut Explore machines are competitively priced and capable of cutting various materials like vinyl, iron-on, paper, and cardstock. Both Explore Air 2 and Explore 3 are fantastic, but they only work with Cricut Design Space. Download the Design Space software on your PC or phone at how to setup cricut explore to easily install it, send designs to the machine, and start your DIY craft projects.

Loading...

Reply
Cricut Design Space

2023-10-11

Crafting projects requires a design and automatic precise cuts. Cricut gives you all these features so you can make automated precision cuts on any material, whether it’s vinyl, paper, balsa wood, cardstock, or any other material. The Cricut machine is a digital advanced tool, and you can connect it to your computer or mobile phone via Bluetooth or USB. If you want to use the Cricut machine, you will have to download Design Space on your device from the website install cricut design space app.

Loading...

Reply
cricut software app

2023-10-19

Welcome to the world of Cricut, a powerful cutting machine that enables you to create beautiful designs and craft projects easily. With cricut.com/create, you can unlock a world of endless possibilities, from personalized gifts to home decor and beyond. To get started, simply login to the Cricut Design Space, where you can access a library of thousands of pre-designed craft projects, images, and fonts. You can manage its settings according to your needs, you can download and install the Cricut desktop app and you can work on your designs anytime.

Loading...

Reply