Recently, there is a surge in popularity for making the training process of deep learning simpler and more accurate, because it is the vital requirement for researchers aiming to convert research results into industrial applications, like robots.
In this paper, the authors gave a very innovative way of learning lightweight models, which will help achieve an accuracy greater than 90% but an order of magnitude faster, with an order of magnitude fewer parameters. The authors mainly used the following 3 steps:
1). Adapt a pre-trained model to the task at hand.
2). Use model compression techniques in the adapted model to learn a lightweight deep convolutional neural networks (DCNN) that has much fewer parameters
3). The authors combined K lightweight models as a mixture model to enhance the performance of the lightweight models.
At the end, this method was applied to agricultural robots and achieved a good performance.
Nowadays, the use of agricultural robots are steadily rising, like the AgBot II  in Fig. 1 which helps detect and classify weeds, and Harvey  which is for detecting and segmenting crops. The current approach of weed segmentation is to combine the shape and pixel statistic features, using a random forest classifier for the classification. It is hard to deploy DCNN because robotic platforms are resource limited, and it is hard to train a DCNN with limited data. The use of state-of-the-art networks requires a level of computational power most common robots cannot afford. This paper makes a trade-off between complexity and accuracy, and settled this problem by the following steps. First, they adapted a pre-trained model, the Inception-v3 , to the task. Then, they used model compression and “distillation” techniques to achieve orders of magnitude fewer parameters. Finally, based on prior work, the authors combined a set of K-lightweight models into one mixture model to enhance the performance.
This method achieved an impressive result for weed segmentation. The Adapted-IV3 model achieved improved accuracy from 85.9% to 93.9%. And for K = 4 lightweight DCNNs, it can achieve an accuracy of 90.3% while using much fewer number of parameters and improved frame rate.
The authors provided a strong literate review, it talked about the history and the approach leading to this novel idea. They also provided analysis of the pros and cons for the current trends in learning features and model compression. It’s not the key point for this paper, so I won’t go into the details. If you are interested in this paper, it is worth reading.
The proposed method is mainly a three-step process, and this process offers a trade-off between both speed and memory size with accuracy. The proposed method is used to efficiently solve the weed segmentation problem for robotic platforms like the AgBot II. The sliding window is 81 × 81 × 3 across the colored image, and the class of the central pixel is declared (either weed or crop). Due to most robotic vision problems being sparse problems, the fully-connected network (FCN) is not trialed. A sparse problem in this paper means weed segmentation only requires classification for pixels containing vegetation, so the authors gave an explanation as to why they didn’t use the FCN, because the FCN is designed and trained for problems that require dense (for every pixel) decisions. The authors use Fig. 2 to illustrate what is a sparse problem.
Here I will briefly describe the three main processes:
A. Transfer Learning: Adapting complicated Pre-Trained Networks:
If you get limited data to train a network, such as in this case, it is a better approach to take a pre-trained network and adapt it for your task. Several famous pre-trained network that are frequently used are VGGnet, GoogleNet and so on. For this paper, the authors used the latest version of GoogleNet – Inception-v3 – which is much smaller compared to other models. The authors then upsampled the original images to match the required image size for Inception-v3.
B. Model Compression: Training Lightweight DCNNS:
Even if the Inception-v3 takes 25M parameters (which is much smaller than others), it still needs the model compression technology. Here, it regards the trained model from process A as the teacher network, and learns a lightweight student DCNN from this model. To train this model, the classification loss (between the student and groud-truth) and the L2 loss (between the teacher’s logic output and the students logic output) are used. The authors considered two potential structures, the first consists of 8 convolutional layers and 1 fully connected layer, which is similar to AlexNet, you can see it in Fig. 3 with detailed explanations. The second network structure consists of 4 convolutional layers followed by 2 inception-style modules and 1 fully connected layer (similar to GoogLeNet), you can see it in Fig. 3 (bottom). There are 4 sub-modules illustrated in Fig. 4. You can see the details in the description.
And the details of parameters are shown in Table 1.
C. Mixtures and Ensembles of DCNNs:
In this part, it enhances the performance by combining these lightweight models. By taking the average of the decisions, it can improve the performance with low cost. First, an occupation probability is calculated based on the maximum logic value.
Then, we get the final classification, which is weighted by the respective occupation probability as shown in formula 2:
The authors used this approach to settle the weed segmentation problem in robotic platforms. And the models were implemented in TensorFlow. The AdamOptimizer’s parameters are showing as follows:
- learning rate of γ = 1e−4, ε = 0.1
- batch size of b = 60
- dropout rate of 50%.
1. Accuracy of Weed Segmentation:
The authors used the public training data set “the Crop/Weed Field Image Dataset (CW-FID)”, which consists of 20 training images and 40 testing images. And the authors compared the results with Haug and Ostermann , who trained a random forest classifier on features. We can see the details results in Table II, and it shows that all of the presented deep learning solutions considerably outperform the previously proposed methods. The Adapted-IV3 model achieved the highest accuracy of about 93.9%, while Haug and Ostermann achieve an accuracy of 85.9%.
2. Mixtures of Lightweight DCNNs:
Here, the authors combined multiple lightweight models as Mix-AgNet and Mix-MiniInception. The MixDCNN approach provided an average improvement of 0.15% compared to using an Ensemble. For one GPU, there is no added model complexity. In Table II, it can be seen that by increasing K (the number of models), the overall performance is increased while the relative performance decreased. AgNet DCNNs with K = 4 is a good trade-off between model complexity and accuracy. You can see the visualization results in Fig. 5 with three DCNN models: AgNet, MixAgNet (K = 4), and Adapted-IV3.
3. Speed and Model Complexity:
In Table III, the number of regions which can be processed per second is presented. We find that the less complicated models are, the more samples they can process. And increasing the complexity by adding more models can decrease the number of regions which can be processed per second. To strike a balance between them, we can select the Mix-AgNet k = 4 model and Mix-MiniInception k = 2 model. We can also see that the AgNet, Mix-AgNet and Mix-MiniInception approaches have great potential to be deployed on the robotic platform by considering the speed and the complexity.
This paper provides a novel way to train DCNNs that can be used in the robotic platform. It has a strong literature review and clear figures and tables, which can allow the readers to easily understand the background information and the details about realizing this approach. Though the authors didn’t tell us why they used this specific approach, this paper provides us a novel way to think about and employ deep neural networks. As the authors suggested at the end of the paper, we can try more approaches to improve the performance, but the value of the paper is to provide a result that can be accepted and applied in the daily applications.
 Bawden, Owen John. Design of a lightweight, modular robotic vehicle for the sustainable intensification of broadacre agriculture. Diss. Queensland University of Technology, 2015.
 Lehnert, Christopher, et al. “Autonomous sweet pepper harvesting for protected cropping systems.” IEEE Robotics and Automation Letters 2.2 (2017): 872-879.
 Szegedy, Christian, et al. “Rethinking the inception architecture for computer vision.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
 Ge, ZongYuan, et al. “Fine-grained classification via mixture of deep convolutional neural networks.” Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 2016.
 Haug, Sebastian, and Jörn Ostermann. “A crop/weed field image dataset for the evaluation of computer vision based precision agriculture tasks.” European Conference on Computer Vision. Springer International Publishing, 2014.
Author: Shixin Gu | Localized by Synced Global Team: Joni Chung