A research team from Huawei Noah’s Ark Lab and Tsinghua University proposes Extract Then Distill (ETD), a generic and flexible strategy for reusing teacher model parameters for efficient and effective task-agnostic distillation that can be applied to student models of any size.
A research team from Google Research proposes small, fast, on-device disfluency detection models based on the BERT architecture. The smallest model size is only 1.3 MiB, representing a size reduction of two orders of magnitude and an inference latency reduction of a factor of eight compared to state-of-the-art BERT-based models.
UmlsBERT is a deep Transformer network architecture that incorporates clinical domain knowledge from a clinical Metathesaurus in order to build ‘semantically enriched’ contextual representations that will benefit from both the contextual learning and domain knowledge.
In what the company calls “the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of Search,” Google today announced that it has leveraged its pretrained language model BERT to dramatically improve the understanding of search queries.
Since Google Research introduced its Bidirectional Transformer (BERT) in 2018 the model has gained unprecedented popularity among researchers. Now, a group of researchers from the National Cheng Kung University Tainan in Taiwan are challenging BERT’s efficacy.