Machine Learning Algorithms and Applications. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Machine Learning Algorithms and Applications - Группа авторов страница 18
The overall result of egg classification and counting yields an accuracy greater than 97%, and Figure 2.7 represents the result generated using the proposed method where green dots represent the hatched eggs while red dots represent unhatched eggs. Some of the areas of the images are zoomed and shown separately in Figure 2.7 since the input image is too big to fit in the page and eggs are minuscule to see any features.
Figure 2.7 Result of egg classification generated by the proposed method.
2.4 Dataset Generation
In comparison with the conventional method of extracting egg count information using digital images that hardly require any training data, the proposed method that employs the CNN technique required large datasets to learn the features automatically to provide the required results. The CNN method uses plenty of training data along with test and validation datasets as the number of hidden layers increases.
There are many datasets available for free that can be downloaded to train our own CNN models to classify handwritten digits, identify objects, and many more. But there is no single public dataset available corresponding with the sericulture field especially silkworm egg counting or classification. So, in our work, training datasets were generated by cropping class images from the silkworm egg sheet and providing class labels and other features that are necessary for CNN training such as egg center location. Over 400K image set was generated for egg location and FB class and over 100K image set for individual classes (HC and UHC). Also, data augmentation is implemented to increase the datasets.
2.5 Results
The trained CNN models were tested with new silkworm egg sheets that were scanned using a Canon® paper scanner at 600 dpi, to classify and count the number of eggs. These digital datasets were completely isolated from the training step; thereby, the trained CNN models had to predict the results than providing learned results. Table 2.4 represents the performance of the overall CNN model trained using our datasets. The performance of a few datasets is shown due to space restriction. It can be observed that CNN models trained with two hidden layers perform superior to the conventional techniques by providing accuracy of over 97%. The accuracy shown in Table 2.4 is the accuracy of the number of eggs counted and accuracy in classifying the eggs. The model consistently outperforms the conventional computer vision/image processing technique of silkworm counting and classification with accuracy over 97% for newer data of the same breed. The inference time shown in Table 2.4 was performed on an Nvidia GPU (GTX 1060).
The model performance drops to newer egg data that are completely different in color and texture, which were not available in the training dataset. This happens due to the nature of different breed eggs that are spatially different from the trained model. Collecting and training a deep learning model to a different breed of silkworm eggs will resolve these issues, which is under action.
Table 2.4 Performance of the CNN model results on test datasets.
Test sample | True count | Count prediction | Time (sec) | Class scores | Accuracy (%) | |
HC | UHC | |||||
MSR1_001.jpg | 588 | 586 | 11.83 | 437 | 149 | 99.65 |
MSR1_002.jpg | 534 | 526 | 8.99 | 473 | 53 | 98.68 |
MSR1_003.jpg | 554 | 556 | 10.42 | 491 | 65 | 99.28 |
MSR1_004.jpg | 539 | 528 | 9.81 | 501 | 27 | 97.95 |
MSR1_005.jpg | 597 | 588 | 11.14 | 562 | 26 | 98.32 |
2.6 Conclusion
In this paper, CNN-based silkworm egg counting and classification model that overcomes many issues found with conventional image processing techniques is explained. The main contribution of this paper is in fourfolds. First, a method to generalize the method of capturing silkworm egg sheet data in a digital format using normal paper scanners rather than designing extra hardware, which eliminates the need for additional light sources to provide uniform illumination while recording data and maintain high repeatability.
Second, the scanned digital data can be transformed into standard size by using key markers stamped onto the egg sheets before scanning. This allows the user to resize the dimension of digital data and later use it in an image processing algorithm or CNN without introducing dimensionality error.
A dataset has been put together containing over 400K images representing different features of silkworm eggs. The CNN and other models that need a lot of training, testing and validation data can easily use this dataset to skip the data generation phase which is the third contribution.
Fourth, a CNN model has been trained using the dataset that is designed to predict the egg class and count the number of eggs per egg sheet. With over 97% accuracy the model outperforms many conventional approaches with only 4 hidden layers and a fully connected layer.
The model performs accurately in quantifying (counting) different breed silkworm eggs, but new datasets become necessary to predict the class labels for new silkworm breed for which the model is not trained. This is because HC class eggs have high pixel intensity throughout the egg surface while UHC has dark pixels at the center surrounded