AI Machine Learning & Data Science Research

Meta AI’s Nougat Enables Conversion of Mathematic Expressions from PDF Files to Machine Readable Texts

A Meta AI research team presents Neural Optical Understanding for Academic Documents (Nougat), a Visual Transformer model that can effectively convert scientific documents stored in PDF format to a lightweight markup language, even intensive mathematical equations are involved.

The majority of scientific knowledge is most commonly stored in the form of Portable Document Format (PDF), which are also the second most prominent data format on the internet. However, to extract information from this format or transform them into machine-readable text are challenging, especially when mathematical expressions are involved.

To address this issue, previous studies propose Optical Character Recognition (OCR), a effective technology for detecting and classifying individual characters and words from an image, to process scientific documents by treating them as images, but they fail to capture the relationship between sentences as they process the sentences line-by-line.

In a new paper Nougat: Neural Optical Understanding for Academic Documents, a Meta AI research team presents Neural Optical Understanding for Academic Documents (Nougat), a Visual Transformer model that can effectively convert scientific documents stored in PDF format to a lightweight markup language, even intensive mathematical equations are involved.

The team summarizes their primary contributions as follows:

  1. Release of a pre-trained model capable of converting a PDF to a lightweight markup language. We release the code and the model on GitHub.
  2. We introduce a pipeline to create dataset for pairing PDFs to source code.
  3. Our method is only dependent on the image of a page, allowing access to scanned papers and books.

The proposed Nougat is built upon Donut architecture. The Swin Transformer encoder takes a document image as inputs and output a sequence of latent embeddings. Next, the encoded image is decoded into a sequence of tokens through a transformer decoder architecture with cross-attention in a autoregressive manner. Finally, the output is projected to the size of the vocabulary.

Notably, the researchers leverage recent advances in visual document understanding to a novel OCR task, but contrary to previous approaches, Nougat does not need to rely on OCR or embedded text representations, only the rasterized document pages are needed.

In their empirical study, the team compared Nougat with baseline model GROBID, Nougat achieves the highest performance in all metrics, including Edit distance, BLEU, METEOR and F-measure.

Overall, this work demonstrates that Nougat not only has great potential to extract text from digital-born PDFs, but also can handle scanned papers and textbooks. The team hopes their work can serve as a start point for more future research in the related fields.

The code is available on project’s GitHub. The paper Nougat: Neural Optical Understanding for Academic Documents on arXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

2 comments on “Meta AI’s Nougat Enables Conversion of Mathematic Expressions from PDF Files to Machine Readable Texts

  1. Pingback: Meta AI’s Nougat Enables Conversion of Mathematic Expressions from PDF Files to Machine Readable Texts – Ai Headlines

  2. Pingback: Meta AI’s Nougat Enables Conversion of Mathematic Expressions from PDF Files to Machine Readable Texts

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: