Language matters in media, where even seemingly minor lexical choices can significantly affect how a message is received. In the 1800s in the US, the movement to end slavery found a voice in a number of abolitionist newspapers. Those at the vanguard helped frame a new narrative that influenced other media and shaped public opinion on the issue.
In the research paper Abolitionist Networks: Modeling Language Change in Nineteenth-Century Activist Newspapers, a team from Google, Georgia Tech and Emory University leverages machine learning to investigate these papers’ lexical semantic innovations. The study identifies papers that took the lead in “new usages of specific words” and papers that were followers in this regard, providing an overall picture of the network of semantic influence among the publications.
The researchers summarize their methodological contributions as:
- Propose a text modelling approach to identify semantic leadership in a corpus of timestamped documents from a set of sources, such as newspapers. The proposed approach includes (a) a model to identify semantic changes in language using diachronic word embeddings, and (b) a statistical measure to quantify the extent to which each source led others in the adoption of each change.
- Apply the proposed method to a corpus of nineteenth-century newspapers digitized by Accessible Archives, and quantify the lead-lag relationships between the newspapers for each of these changes.
- Aggregate the dyadic relationships between newspapers into a weighted semantic leadership network, retaining links between newspapers for each semantic change only if the relationship is so strong as to be highly unlikely to have arisen by chance.
The team first developed a model of semantic change that builds on word embeddings enhanced by integrating metadata from the different newspapers studied regarding word occurrence. In order to make the large corpora sums comparable, the researchers estimated embeddings on multiple corpora from different time periods and then aligned the embedding vectors. They estimated the embeddings by optimizing a classical skipgram objective, in which embedding of each work token is conditioned on its neighbouring tokens. To model the embedding changes over time, they computed the differences between neighbours for all pairs of occurrences for every word and used the intervals with the maximum differences as the measure of semantic change in the word. The words were ranked in this manner to yield an ordered list of changes as tuples of words and timestamps.
The researchers identify three main advantages of their approach to modelling semantic changes:
- As a joint model of words and time, it does not require the computationally expensive post-alignment of the word embeddings.
- Temporal embeddings can be learned even for words that emerge or disappear before the start or end of the time period respectively.
- The model is easily extended to incorporate other metadata about the text if available.
In the next step, the researchers endeavoured to identify semantic leaders. Any semantic change involved papers that were leaders and others that were followers, as well as opponents who resisted change. The team identified who led and followed each semantic change by augmenting a diachronic embedding model to include an additional term for the source of each token. This novel formula enabled them to calculate the lead score of each newspaper over the others.
Finally, they aggregated individual leadership events into a semantic leadership network. Specifically, they constructed an edge-weighted network, where the set of newspapers forms the nodes of the network, and every weighted edge denotes the number of words for which a given leader newspaper leads another, follower paper.
To reveal the specific roles played by each newspaper in the evolving discourse of abolitionism, the team conducted experiments on a subset of ten nineteenth-century newspapers digitized and hand-keyed by Accessible Archives. They also identified specific terms (e.g. equality, freedom and justice etc.) that saw significant semantic changes in these papers.
The results of leader-follower newspaper pairs indicate the paper The Liberator held a broad influence over many of the other newspapers, while the National Anti-Slavery Standard for example tended to be a follower.
The sub-network results indicate dominant newspapers with black editors were Douglass Newspapers and the Provincial Freeman Colored American. The team notes that the Colored American, with mostly Black readers, led on the terms “immediate” and “fight,” which “suggest a tone of urgency that might surround an argument for liberation” and supports the claim that the paper “deserves more of the credit for accelerating the fight against slavery.”
The researchers also note that two newspapers edited by women — The Provincial Freeman and The Lily — led a large number of semantic changes in the corpus, “lending additional credence to the argument that a multiracial coalition of women led the abolitionist movement in terms of both thought and action.”
The paper Abolitionist Networks: Modeling Language Change in Nineteenth-Century Activist Newspapers is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.