A new research paper demonstrating the first Automatic Grammatical Error Correction system to reach human-level performance has been published on arXiv by Tao Ge, Furu Wei and Ming Zhou from the Natural Language Computing Group, Microsoft Research Asia. Reaching Human-Level Performance in Automatic Grammatical error Correction: An Empirical Study can be found here: https://arxiv.org/abs/1807.01270.
The paper’s authors have indicated substantial text overlap with Fluency Boost Learning and Inference for Neural Grammatical Error Correction, accepted by ACL 2018.
In recent years Neural sequence-to-sequence (seq2seq) models have come to be considered proven approaches to grammatical error correction (GEC). But most seq2seq models for GEC have flaws, which limits the training scale to only limited error-corrected sentence pairs and to sentences with limited grammatical errors through single-round seq2seq inference — especially if particular errors in a sentence make the context unclear, which can confuse the model’s subsequent error-corrections.
To address these limitations, the paper proposes a new novel fluency boost learning and inference mechanism based on the seq2seq framework.
For fluency boosting learning, besides the original error-corrected sentence pairs, the new mechanism allows training new error-corrected sentence pairs established by generating less fluent sentences (e.g., from the seq2seq model’s n-best outputs) as additional training instances during subsequent training epochs, which offers the error correction model more training sentences and accordingly helps improving the model’s generalization ability.
The generated error-corrected sentence pairs by pairing the less fluent sentences with their correct sentences during training, as Figure 2(a) shows, are named fluency boost sentence pairs in this paper.
For model inference, a fluency boost inference mechanism is proposed to correct sentences incrementally with multi-round inference as long as the proposed edits can boost the sentence’s fluency, as Figure 2(b) shows.
The paper also proposes a “round-way correction” approach for the repeatedly edited output prediction and basic fluency boost inference idea that uses two seq2seq models whose decoding orders are left-to-right and right-to-left respectively. This round-way correction results in a significant improvement of recall.
Through experiments, the combination of fluency boost learning and inference with convolutional seq2seq models achieved 75.72 F0.5 on the CoNLL-2014 10 annotation dataset and 62.42 GLEU on the JFLEG test set. As Table 1 shows, this test result makes the GEC system3 the first to reach human-level performance on both GEC benchmarks.
System outputs for the CoNLL-2014 and JFLEG test sets are available at https://github.com/ getao/human-performance-gec.
Author: Chenhui Zhang | Editor: Michael Sarazen