Ancient texts inscribed on durable materials such as stone, pottery and metal are valuable historical assets that reflect and preserve the thought, language, society and history of civilizations from thousands of years ago. Unfortunately, many surviving inscriptions have been damaged over the centuries, and their restoration and interpretation can involve highly complex, time-consuming and specialized workflows.
In the new paper Restoring and Attributing Ancient Texts Using Deep Neural Networks, published in the prestigious science journal Nature, a research team from DeepMind, Ca’ Foscari University of Venice, University of Oxford and Athens University of Economics and Business introduces Ithaca, a deep neural network (DNN) specifically designed for the textual restoration and geographical and chronological attribution of ancient Greek inscriptions. In evaluations, Ithaca achieves 62 percent accuracy on damaged text restoration tasks; and, when used by historians, boosts their accuracy from 25 to 72 percent.
Ithaca was trained on Greek-language inscriptions from across the Mediterranean dating between the seventh century BC and the fifth century. The researchers sourced the unprocessed Packard Humanities Institute (PHI) dataset to build their 78,608-inscription I.PHI, which they believe is the largest multitask dataset of machine-actionable epigraphical text.
Ithaca addresses three main tasks in epigraphy: textual restoration, geographical attribution and chronological attribution. It can handle long-term context information and generate interpretable outputs to assist scholars on these tasks.
The three steps in the Ithaca workflow can be summarized as follows:
- The input texts are jointly characterized as character and word representations, with damaged, missing or unknown words represented with a special symbol ‘[unk]’;
- To enable large-scale processing, Ithaca’s torso leverages a model comprising stacked transformer blocks, where each block outputs a sequence of processed representations of input characters, and the output of each block becomes the input of the next block.
- Final outputs are fed into three task heads specifically trained to handle restoration, geographical attribution and chronological attribution. The restoration head predicts missing characters, the geographical attribution head regionally classifies the inscription, and the chronological attribution head dates the inscription.
Ithaca was designed to maximize the collaborative potential between historians and deep learning by providing insightful outputs. For the restoration task, Ithaca outputs a set of the top 20 decoded predictions ranked by probability. The geographical attribution task classifies the input text via ranked visual outputs of possible region predictions, and the chronological attribution task predicts a categorical distribution over a date range of between 800 BC and 800 AD.
The team evaluated Ithaca on all three tasks. On restoration, Ithaca consistently outperformed baseline methods, achieving a 26.3 percent CER (character error rate, lower scores are better) and 61.8 percent top-1 accuracy. On regional attribution, Ithaca obtained 70.8 percent top-1 and 82.1 percent top-3 predictive accuracy. Ithaca also bettered human baseline predictions on chronological attribution.
In recent years, AI systems have also helped historians decipher ancient Persian scripts on cuneiform tablets and classify 3000-year-old Chinese oracle bone rubbings. The team believes Ithaca’s scope makes it the first epigraphic restoration and attribution model of its kind and hopes it will encourage further cooperation between AI and historians to provide new insights on and understanding of important periods in human history.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.