Boosting the classification success in imbalanced data of bee larva cells

Authors

DOI:

https://doi.org/10.58190/ijamec.2024.78

Keywords:

Imbalanced dataset, Oversampling method, SMOTE, Beekeeping, Detection of larval cells

Abstract

Selecting the appropriate honey harvesting method is crucial for sustainable beekeeping and optimal honey production. The use of primitive harvesting methods can lead to the death of bees and a decrease in honey yield. This study aims to address the issue of detecting and classifying young larvae on honeycombs. However, the area where young larvae are found is limited compared to other areas.  In this study, the dataset obtained from honeycombs was imbalanced, which has used the Synthetic Minority Oversampling TEchnique (SMOTE) algorithm to balance it. The SMOTE algorithm is a synthetic data generation method. The balanced dataset was then used for classification processes with k-Nearest Neighbors algorithm (k-NN), Decision Trees, and Support Vector Machines. The evaluation of the classification results included the F1-Score, G-Mean, and AUC metrics. The results showed that the classification of the dataset balanced with synthetic data was more successful.

Downloads

Download data is not yet available.

References

N. V. Chawla, N. Japkowicz, and A. Kotcz, "Special issue on learning from imbalanced data sets," ACM SIGKDD explorations newsletter, vol. 6, no. 1, pp. 1-6, 2004.

A. Fernández, S. Garcia, F. Herrera, and N. V. Chawla, "SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary," Journal of artificial intelligence research, vol. 61, pp. 863-905, 2018.

E. Kaya, S. Korkmaz, M. A. Sahman, and A. C. Cinar, "DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets," Expert Systems with Applications, vol. 169, p. 114482, 2021.

Y. Sun, A. K. Wong, and M. S. Kamel, "Classification of imbalanced data: A review," International journal of pattern recognition and artificial intelligence, vol. 23, no. 04, pp. 687-719, 2009.

M. Zareapoor and J. Yang, "A novel strategy for mining highly imbalanced data in credit card transactions," Intelligent Automation & Soft Computing, pp. 1-7, 2017.

S. Cateni, V. Colla, and M. Vannucci, "A method for resampling imbalanced datasets in binary classification tasks for real-world problems," Neurocomputing, vol. 135, pp. 32-41, 2014.

M. A. Mazurowski, P. A. Habas, J. M. Zurada, J. Y. Lo, J. A. Baker, and G. D. Tourassi, "Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance," Neural networks, vol. 21, no. 2-3, pp. 427-436, 2008.

D. A. Cieslak, N. V. Chawla, and A. Striegel, "Combating imbalance in network intrusion datasets," in GrC, 2006: Citeseer, pp. 732-737.

M. Kubat, R. C. Holte, and S. Matwin, "Machine learning for the detection of oil spills in satellite radar images," Machine learning, vol. 30, no. 2, pp. 195-215, 1998.

Y. Li, G. Sun, and Y. Zhu, "Data imbalance problem in text classification," in 2010 Third International Symposium on Information Processing, 2010: IEEE, pp. 301-305.

S. Wang and X. Yao, "Using class imbalance learning for software defect prediction," IEEE Transactions on Reliability, vol. 62, no. 2, pp. 434-443, 2013.

P. Jeatrakul, K. W. Wong, and C. C. Fung, "Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm," in International Conference on Neural Information Processing, 2010: Springer, pp. 152-159.

M. Yavaş, A. Güran, and M. Uysal, "Covid-19 Veri Kümesinin SMOTE Tabanlı Örnekleme Yöntemi Uygulanarak Sınıflandırılması," Avrupa Bilim ve Teknoloji Dergisi, pp. 258-264, 2020.

N. Çürükoğlu, "Imbalanced Dataset Problem in Classification Algorithms," in 2019 1st International Informatics and Software Engineering Conference (UBMYK), 2019: IEEE, pp. 1-5.

P. Hart, "The condensed nearest neighbor rule (corresp.)," IEEE transactions on information theory, vol. 14, no. 3, pp. 515-516, 1968.

Ö. Çelik and G. Kaplan, "Yeniden Örnekleme Teknikleri Kullanarak SMS Verisi Üzerinde Metin Sınıflandırma Çalışması," Erciyes Üniversitesi Fen Bilimleri Enstitüsü Fen Bilimleri Dergisi, vol. 36, no. 3, pp. 434-443, 2020.

I. Tomek, "Two modifications of CNN," 1976.

A. O. Durahim, "Comparison of sampling techniques for imbalanced learning," Yönetim Bilişim Sistemleri Dergisi, vol. 2, no. 2, pp. 181-191, 2016.

M. Kubat and S. Matwin, "Addressing the curse of imbalanced training sets: one-sided selection," in Icml, 1997, vol. 97: Citeseer, pp. 179-186.

D. L. Wilson, "Asymptotic properties of nearest neighbor rules using edited data," IEEE Transactions on Systems, Man, and Cybernetics, no. 3, pp. 408-421, 1972.

J. Laurikkala, "Improving identification of difficult small classes by balancing class distribution," in Conference on Artificial Intelligence in Medicine in Europe, 2001: Springer, pp. 63-66.

I. Mani and I. Zhang, "kNN approach to unbalanced data distributions: a case study involving information extraction," in Proceedings of workshop on learning from imbalanced datasets, 2003, vol. 126: ICML United States.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of artificial intelligence research, vol. 16, pp. 321-357, 2002.

H. Han, W.-Y. Wang, and B.-H. Mao, "Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning," in International conference on intelligent computing, 2005: Springer, pp. 878-887.

C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, "Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem," in Pacific-Asia conference on knowledge discovery and data mining, 2009: Springer, pp. 475-482.

H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning," in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), 2008: IEEE, pp. 1322-1328.

M. A. Aydın, "Müşteri kaybı tahmininde sınıf dengesizliği problemi," Politeknik Dergisi, pp. 1-1, 2020.

İ. B. Aydilek, "Yazılım hata tahmininde kullanılan metriklerin karar ağaçlarındaki bilgi kazançlarının incelenmesi ve iyileştirilmesi," Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, vol. 24, no. 5, pp. 906-914, 2018.

H. Demir, P. Erdoğmuş, and M. Kekeçoğlu, "Destek Vektör Makineleri, YSA, K-Means ve KNN Kullanarak Arı Türlerinin Sınıflandırılması," Düzce Üniversitesi Bilim ve Teknoloji Dergisi, vol. 6, no. 1, pp. 47-67, 2018.

B. Daş and İ. Türkoğlu, "DNA dizilimlerinin sınıflandırılmasında karar ağacı algoritmalarının karşılaştırılması," 2014.

M. F. Amasyalı, B. Diri, and F. Türkoğlu, "Farklı özellik vektörleri ile Türkçe dokümanların yazarlarının belirlenmesi," in 15th Turkish Symposium on Artificial Intelligence and Neural Networks, 2006.

A. Güran, M. Uysal, and Ö. Doğrusöz, "Destek vektör makineleri parametre optimizasyonunun duygu analizi üzerindeki etkisi," Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, vol. 16, no. 48, pp. 86-93, 2014.

L. Tomak and B. Yüksel, "İşlem karakteristik eğrisi analizi ve eğri altında kalan alanların karşılaştırılması," Journal of Experimental and Clinical Medicine, vol. 27, no. 2, 2009.

Downloads

Published

27-03-2024

How to Cite

[1]
S. Özgün and M. A. Şahman, “Boosting the classification success in imbalanced data of bee larva cells”, J. Appl. Methods Electron. Comput., vol. 12, no. 1, pp. 10–15, Mar. 2024.

Issue

Section

Research Articles