Transfer learning is a popular topic in machine learning, especially when large amounts training data is scarce. This paper systematically studies how a convolutional neural network, trained on ImageNet for image classification tasks, works on medical images – or more precisely on ultrasound images – for the “Kidney Detection” problem.
In this work, the authors use CaffeNet as shown in figure 1 as a feature-extractor to study the process of transfer learning. But they only use the feature after “fc7” layer in figure 1 with 4096 dimensions for detection tasks. The three types of extracted features are as follows:
CaffeNet_FA (Full Network Adaption): The entire network weights are updated and fine-tuned on kidney image samples.
CaffeNet_PA (Partial Network Adaption): Keep the weights of “conv1” and “conv2” unchanged while updating the weights of other layers on kidney image samples.
CaffeNet_NA (Zero Network Adaption): Keep all weights of the entire model unchanged.
To compare with conventional methods on medical image detection problems, they also extracted Haar-Features of these kidney image samples with about 2000 dimensions as inputs for classification.
3. Experiments and Results
The authors use two metrics for validation: 1) Number of localization failures, which is the number of images with Dice-Index less than 0.80 between detected kidney ROIs and Ground Truth image patches; 2) Detection accuracy. To compare different kinds of features, the authors train another binary classifier, which is a Gradient Boosting Machine (GBM).
The entire quantitative result is shown in table 1:
Obviously, the feature extracted from a pre-trained CNN contributes more for the subsequent classification task. Even without fine-tuning, CaffeNet_NA achieves a higher accuracy than Haar-Features with the same number of failures. The fusion between two kinds of features not only achieves the highest accuracy, but surprisingly, also reduces the number of failures to 3, compared to other features with more than 10 failures.
The following figure shows a visual result of classification using different types of features as input.
The authors compare some of the response images generated from layers 1 and 2 of the learned convolutional neural network with conventional image processing outputs, e.g. Phase Congruency and Frangi Vesselness Filter as shown below:
They found that CNN learns features that are equivalent to some of these widely used non-linear feature extractors. For example, the response images (g) and (i) are similar to (b) and (c). From (g) and (d), they observe a reduction of speckle noise.
Table 2 shows the number of filters in each layer that changed by more than 40% using L2-norm as metric.
Filters in layers 1 and 2 does not change much. The authors think it is possibly due to the lower level features being roughly the same for both natural and ultrasound images.
4. Thought from the Reviewer
This paper investigates some of the details of transfer learning on medical images. It shows that a convolutional neural network trained on natural images is able to be an efficient feature extractor for medical images. Fine-tuning of a pre-trained network can boost the performance of the extraction of features. A pre-trained convolutional neural network could extract the same or similar features as non-linear feature extractors based on mathematical inductions.
The validation set is too small, which only includes 45 images in this experiment. So the result of “number of failures” is not convincing enough.
- Compare more different pre-trained networks as feature extractor to study whether the architecture of the neural network has influence on transfer learning.
- Use different medical images (here only kidney images were used) to compare the performance of the feature-extractors.
Transfer learning is an active research area in machine learning. First, in many situations (like medical images), we only have a small number of training samples. We need a model that can perform well even with only a small number of training samples. Secondly, establishing a whole new network from scratch is very difficult and time-consuming, because there are thousands of hyper-parameters to initialize and tune. For many cases, these parameters are very sensitive and are tricky to tune. So transfer learning provides us with a simpler way to realize training using a given data-set. Furthermore, by studying internal processing principles of transfer learning, we could learn more about the common structures or relations between different training tasks.
Author: Yiwen Liao | Editor: Joni Chung