Stanford Study Says Higher ImageNet Performance Does Not Improve Medical Image Interpretation

In a new paper, Stanford University researchers dispel the assumption that boosting deep learning model performance on ImageNet translates to better performance on medical imaging tasks.

by Synced

2021-01-20

Comments 7

In the paper CheXtransfer: Performance and Parameter Efficiency of ImageNet Models for Chest X-Ray Interpretation, Stanford University researchers address the assumption that boosting deep learning model performance on ImageNet translates to better performance on medical imaging tasks. AI Guru Andrew Ng participated in the study, which, surprisingly, concludes that’s not the case.

In 2017, Ng and his Stanford team introduced the groundbreaking deep learning (DL) model CheXNet, which detects pneumonia from chest X-rays with accuracy exceeding that of practicing radiologists. Four years later, most DL approaches for chest X-ray interpretation rely on pretrained models developed using popular large image database ImageNet. The de-facto transfer learning DL approaches have assumed the ImageNet pre-trained weights would lead to better model performance, and that better ImageNet based architectures would also perform better on chest X-ray interpretation.

Architecture improvements on ImageNet however may not translate into gains in medical imaging task performance. Why? The researchers note that the older and established architectures designed for the CheXpert large radiograph dataset outperform newer architectures generated through search on ImageNet, and that “this finding suggested that search may have overfit to ImageNet to the detriment of medical task performance, and ImageNet may not be an appropriate benchmark for selecting architecture for medical imaging tasks.”

The researchers compared the transfer performance and parameter efficiency of 16 popular convolutional architectures on five tasks on CheXpert, which comprises 224,316 X-rays of 65,240 patients. The architectures — DenseNet (121, 169, 201), ResNet (18, 34, 50, 101), Inception (V3, V4), MNASNet, EfficientNet (B0, B1, B2, B3), and MobileNet (V2, V3) — were evaluated using AUC – ROC AUC Metrics (Compute Area Under the Receiver Operating Characteristic Curve).

Does higher performance on ImageNet translate to higher performance on medical imaging tasks?

Surprisingly, the answer is no!

We investigate their relationship.
Paper: https://t.co/LDAbVkEmIf @_alexke, William Ellsworth, Oishi Banerjee, @AndrewYNg @StanfordAILab

1/8 pic.twitter.com/R0Gh0FaJsS
— Pranav Rajpurkar (@pranavrajpurkar) January 19, 2021

Paper co-author and Stanford University PhD student PThe researchers explain that differences in the chest X-ray interpretation task and data attributes could be why the ImageNet performance showed no correlation with CheXpert performance. The medical imaging task differs from natural image classification due to its dependence on abnormalities in a small number of pixels and has far fewer classes than natural image classification datasets. Data attributes for chest X-rays also vary from natural image classification, as X-rays are grayscale and tend to have similar spatial structures across images, etc.

The Stanford study re-examines standard assumptions and improves understanding of the transfer performance and parameter efficiency of ImageNet DL models for chest X-ray interpretation. The researchers believe this is the first study of its kind, and hope it can encourage further exploration of ImageNet architectures and downstream medical task performance.

Paper co-author and Stanford University PhD student Pranav Rajpurkar, listed four key findings:

Architecture improvements on ImageNet do not lead to improvements on chest x-ray interpretation
Surprisingly, model size matters less than model family when models aren’t pretrained
ImageNet pretraining helps, especially for smaller models
Many layers can be discarded to reduce size of a model without performance drop

The researchers explain that differences in the chest X-ray interpretation task and data attributes could be why the ImageNet performance showed no correlation with CheXpert performance. The medical imaging task differs from natural image classification due to its dependence on abnormalities in a small number of pixels and has far fewer classes than natural image classification datasets. Data attributes for chest X-rays also vary from natural image classification, as X-rays are grayscale and tend to have similar spatial structures across images, etc.

The Stanford study re-examines standard assumptions and improves understanding of the transfer performance and parameter efficiency of ImageNet DL models for chest X-ray interpretation. The researchers believe this is the first study of its kind, and hope it can encourage further exploration of ImageNet architectures and downstream medical task performance.

The paper CheXtransfer: Performance and Parameter Efficiency of ImageNet Models for Chest X-Ray Interpretation is available on arXiv.

Reporter: Fangyu Cai | Editor: Michael Sarazen

7 comments on “Stanford Study Says Higher ImageNet Performance Does Not Improve Medical Image Interpretation”

Pingback: The Top Medical Imaging and Radiology News for January 21, 2021
Pingback: [R] Stanford Study Says Higher ImageNet Performance Does Not Improve Medical Image Interpretation – ONEO AI
Pingback: [R] Stanford Study Says Higher ImageNet Performance Does Not Improve Medical Image Interpretation – Frankings meg
alex

2024-04-18

As a person commenting on the article, I would say that the Stanford study’s findings are quite intriguing. While the ImageNet dataset has been a benchmark for computer vision tasks, including medical image interpretation, this study suggests that higher performance on ImageNet does not necessarily translate to better performance in medical image interpretation tasks. This raises important questions about the generalizability of AI models trained on one dataset to other domains, especially in critical areas like healthcare. It highlights the need for more domain-specific datasets and training methodologies to ensure that AI systems perform effectively and safely in medical applications.

Loading...

Reply
Bob

2024-04-18

Vitamin B12 is a crucial nutrient that plays a key role in many bodily functions, including red blood cell formation, nerve function, DNA synthesis, and energy production. It is found naturally in animal products such as meat, fish, eggs, and dairy. Vitamin B12 Liquid Drops deficiency can lead to a range of health issues, including anemia, fatigue, weakness, constipation, loss of appetite, and nerve problems. For individuals who are deficient in vitamin B12 or have difficulty absorbing it from food, vitamin B12 supplements, including liquid drops, can be beneficial. Vitamin B12 liquid drops are a convenient and effective way to supplement this nutrient, especially for those who have difficulty swallowing pills. They are typically taken by placing a few drops under the tongue, where the vitamin is absorbed directly into the bloodstream. Taking vitamin B12 liquid drops can help maintain healthy vitamin B12 levels, which is important for overall health and well-being. However, it’s important to follow the recommended dosage and consult with a healthcare professional before starting any new supplement regimen, especially if you have underlying health conditions or are pregnant or breastfeeding.

Loading...

Reply
Olivia

2024-04-19

Well, this happens. In fact, this does not mean that there is no need to move further in this direction, since perhaps new discoveries or improvements are very close. I also read the post https://www.sono-health.com/handheld-ultrasound-device/ about ultrasound devices that have appeared on the market recently. This is an excellent solution for many doctors and practicing students

Loading...

Reply
Mike

2025-11-07

This is a really insightful analysis highlighting that high ImageNet performance doesn’t always translate to better medical imaging results. For those interested in supporting health initiatives, the Donor Reward Program at provides an excellent way to contribute while earning meaningful recognition. Integrating community-focused programs like this can complement research efforts by encouraging wider participation in lifesaving practices. It’s fascinating to see how both AI research and practical health initiatives can intersect for broader societal impact.

Loading...

Reply

Stanford Study Says Higher ImageNet Performance Does Not Improve Medical Image Interpretation

Like this:

7 comments on “Stanford Study Says Higher ImageNet Performance Does Not Improve Medical Image Interpretation”

Leave a Reply Cancel reply

Related

Share this:

Like this:

7 comments on “Stanford Study Says Higher ImageNet Performance Does Not Improve Medical Image Interpretation”

Leave a Reply Cancel reply

Related