Petuum and Carnegie Mellon University (CMU) researchers have introduced Texar, an open-source general purpose text generation toolkit that can help boost R&D in fast model prototyping and experimentation.
What is Text Generation?
Text generation is a set of natural language processing (NLP) tasks that enable a machine to produce humanlike, comprehensive and grammatically correct texts from a dataset or machine representations. Common NLP tasks include machine translation, dialog systems, text summarization, article writing, text paraphrasing and manipulation, image captioning, and so on.
A wide range of text generation techniques are currently being used in the real world, either on their own or in concert. These include neural encoder-decoders, attentions, memory networks, adversarial methods, reinforcement learning, structured supervision; as well as optimization, data pre-processing and result post-processing procedures, evaluations, etc.
The Petuum and CMU researchers propose that a unified toolkit such as Texar can greatly simplify integration of various techniques when building text generation applications. Texar is open-source, modular, versatile and extensible.
Users can easily construct and edit their own models using Texar’s building blocks: switching between maximum likelihood learning and reinforcement learning for example can be done with just a few lines of code.
Texar includes a wide range of common modules and functionalities that can be used to generate learning algorithms and models for adversarial learning, probabilistic modelling, etc.
Texar supports users’ own customized learning algorithms and models. The platform is also fully compatible with the TensorFlow open source community.
Users can customize their model not only from Python/YAML configuration files, but also from Texar’s Python library APIs.
Texar now is already being used in a number of Petuum research and engineering text generation projects. Opening Texar to a greater number of researchers and practitioners will further enrich the toolkit.
To learn more about Texar:
Tech report: https://arxiv.org/pdf/1809.00794.pdf
Author: Robert Tian | Editor: Michael Sarazen