Site icon Synced

Reproducibility Challenges in Machine Learning for Health

Last year the United States Food and Drug Administration (FDA) cleared a total of 12 AI tools that use machine learning for health (ML4H) algorithms to inform medical diagnosis and treatment for patients. The tools are now allowed to be marketed, with millions of potential users in the US alone.Because ML4H tools directly affect human health, their development from experiments in labs to deployment in hospitals progresses under heavy scrutiny. A critical component of this process is reproducibility.

A team of researchers from MIT, University of Toronto, New York University, and Evidation Health have proposed a number of “recommendations to data providers, academic publishers, and the ML4H research community in order to promote reproducible research moving forward” in their new paper Reproducibility in Machine Learning for Health.

Reproducibility Crisis in machine learning

Just as boxers show their strength in the ring by getting up again after being knocked to the canvas, researchers test their strength in the arena of science by ensuring their work’s reproducibility. If other researchers cannot replicate the research findings, the original study will draw doubters and critics. Although reproducibility is an essential part of science, many sub-fields such as machine learning are now experiencing a reproducibility crisis.

According to a survey of 1,576 researchers conducted by respected journal Nature in 2016, more than 70 percent of researchers failed in their attempts to reproduce others’ experiments, and more than half were unable to reproduce even their own experiment results. In the critical field of medicine, 41 percent or respondents reported taking concrete steps to attempt to improve their research reproducibility.

This April, organizers of one of the world’s largest AI gatherings, the Neural Information Processing Systems Conference (NeurIPS), updated their paper submission policy to include “a mandatory Reproducibility Checklist for all submissions.”

NeurIPS Reproduciblity Checklist

But how to improve reproducibility? Traditionally, researchers either repeated their experiments themselves or appointed someone within their lab to test reproducibility. Another approach has been to improve the documentation and standardization of experiment methods.

The MIT et al researchers argue that it is not enough to merely replicate experiment results, and propose examining a machine learning study from three different perspectives: If other researchers can replicate the exact technical results of a paper under identical conditions, the study has achieved Technical Replicability. They then introduce Statistical Replicability and Conceptual Replicability into the criteria to determine if a study is fully reproducible.

Unique challenges for ml4h

Scientist across various disciplines have deployed machine learning approaches to speed up research data analysis. Isaac Kohane, Chair of the Department of Biomedical Informatics in the Blavatnik Institute at Harvard Medical School, explains: “A machine-learning model can be trained on tens of millions of electronic medical records with hundreds of billions of data points without lapses.”

ML4H however faces unique challenges in Technical Replicability, Statistical Replicability, and Conceptual Replicability. Researchers used both qualitative arguments and quantitative literature reviews of over 300 papers from different institutions covering ML4H, NLP, CV, and general machine learning, concluding that ML4H “lags behind other subfields of machine learning on various reproducibility metrics.”

The researchers propose that putting these three replicabilities at the heart of future ML4H studies will provide a clearer picture for stakeholders; and that multi-institute datasets should be made more accessible for studies, as the increasing use of multi-source data will improve conceptual reproducibility. They also call on the ML community and researchers to focus on “expanding our trajectory of statistical rigor.”

The paper Reproducibility in Machine Learning for Health is available on arXiv.


Journalist: Fangyu Cai | Editor: Michael Sarazen

Exit mobile version