New Multitask Benchmark Suggests Even the Best Language Models Don’t Have a Clue What They’re Doing
Researchers introduce a test covering topics such as elementary mathematics, designed to measure language models’ multitask accuracy.
AI Technology & Industry Review