Enhancing Multi-Label Hate Speech and Abusive Language Detection on Indonesian Twitter Using Recurrent Neural Networks with Hyperparameter Tuning
DOI:
https://doi.org/10.51903/juritek.v3i3.3022Kata Kunci:
Hate Speech Detection, Abusive Language Detection, Multi-label Classification, Recurrent Neural Networks (RNN);, Hyperparameter TuningAbstrak
This study investigates enhancing multi-label hate speech and abusive language detection on Indonesian Twitter using Recurrent Neural Networks (RNNs) with hyperparameter tuning. A dataset of Indonesian tweets labeled for various hate speech and abusive language categories was preprocessed through text cleaning, tokenization, and sequence padding. A baseline RNN model was initially constructed and evaluated. Hyperparameter tuning was then performed using Keras Tuner to optimize performance. The best hyperparameters identified were an embedding dimension of 32, 32 LSTM units, and a dropout rate of 0.2. The tuned model was trained and compared with the baseline. Results indicated improved precision for labels like Abusive, HS_Group, HS_Moderate, and HS_Strong, but a decline in recall and F1-scores for labels like HS_Religion and HS_Race. Overall performance metrics showed a slight decline, highlighting trade-offs in the tuning process. In conclusion, while hyperparameter tuning can enhance certain performance aspects, it also introduces complexities and trade-offs. It is recommended to use hyperparameter tuning in model optimization with careful consideration of application requirements. Further research will explore different model architectures and additional tuning strategies for better overall performance.
Referensi
Adikara, P. P., Adinugroho, S., & Insani, S. (2020). Detection of cyber harassment (cyberbullying) on Instagram using Naïve Bayes classifier with bag of words and lexicon based features. Proceedings of the 5th …. https://doi.org/10.1145/3427423.3427436
Akinyemi, J. D., Ibitoye, A. O. J., Oyewale, C. T., & ... (2023). Cyberbullying Detection and Classification in Social Media Texts Using Machine Learning Techniques. … on Computer Science …. https://doi.org/10.1007/978-3-031-36118-0_40
Alfina, I., Mulia, R., Fanany, M. I., & Ekanata, Y. (2017). Hate speech detection in the Indonesian language: A dataset and preliminary study. 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017, 2018-Janua, 233–237. https://doi.org/10.1109/ICACSIS.2017.8355039
Anindyati, L., Purwarianti, A., & Nursanti, A. (2019). Optimizing Deep Learning for Detection Cyberbullying Text in Indonesian Language. Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, ICAICTA 2019. https://doi.org/10.1109/ICAICTA.2019.8904108
Asti, A. D., Budi, I., & Ibrohim, M. O. (2021). Multi-label Classification for Hate Speech and Abusive Language in Indonesian-Local Languages. 2021 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021. https://doi.org/10.1109/ICACSIS53237.2021.9631316
Aulia, N., & Budi, I. (2019). Hate speech detection on Indonesian long text documents using machine learning approach. ACM International Conference Proceeding Series, 164–169. https://doi.org/10.1145/3330482.3330491
Gultom, R. Y., Zulkarnaen, F. I., Nurhasanah, Y., & Sholahuddin, A. (2021). Indonesian Abusive Tweet Classification based on Convolutional Neural Network and Long Short Term Memory Method. 2021 International Conference on Artificial Intelligence and Big Data Analytics, ICAIBDA 2021. https://doi.org/10.1109/ICAIBDA53487.2021.9689728
Hana, K. M., Adiwijaya, Al Faraby, S., & Bramantoro, A. (2020). Multi-label Classification of Indonesian Hate Speech on Twitter Using Support Vector Machines. 2020 International Conference on Data Science and Its Applications, ICoDSA 2020. https://doi.org/10.1109/ICoDSA50139.2020.9212992
Hendrawan, R., Adiwijaya, & Al Faraby, S. (2020). Multilabel Classification of Hate Speech and Abusive Words on Indonesian Twitter Social Media. 2020 International Conference on Data Science and Its Applications, ICoDSA 2020. https://doi.org/10.1109/ICoDSA50139.2020.9212962
Hidayatullah, A. F., Kalinaki, K., Aslam, M. M., & ... (2023). Fine-Tuning BERT-Based Models for Negative Content Identification on Indonesian Tweets. 2023 8th …. https://ieeexplore.ieee.org/abstract/document/10427046/
James Bergstra, & Yoshua Bengio. (2012). Random Search for Hyper-Parameter Optimization. Journal of Machine Learning Research, 13.
Laxmi, S. T., Rismala, R., & ... (2021). Cyberbullying detection on Indonesian twitter using doc2vec and convolutional neural network. 2021 9th International …. https://ieeexplore.ieee.org/abstract/document/9527420/
Malik, V., Mittal, R., Singh, V., Mittal, A., Singh, S. V., & Diwvedi, S. P. (2023). Detection of Cyberbullying Using Modified Dense Framework. 2023 10th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering, UPCON 2023, 1181–1186. https://doi.org/10.1109/UPCON59197.2023.10434862
Marpaung, A., Rismala, R., & Nurrahmi, H. (2021). Hate Speech Detection in Indonesian Twitter Texts using Bidirectional Gated Recurrent Unit. KST 2021 - 2021 13th International Conference Knowledge and Smart Technology, 186–190. https://doi.org/10.1109/KST51265.2021.9415760
Muhariya, A., Riadi, I., Prayudi, Y., & ... (2023). Utilizing K-means Clustering for the Detection of Cyberbullying Within Instagram Comments. … Des Systèmes d’ …. https://search.ebscohost.com/login.aspx?direct=true&profile=ehost&scope=site&authtype=crawler&jrnl=16331311&AN=171938951&h=jAR8LFD6hrLxqT5pNnTroYYiMkgwU7GV9uo%2FcpMzrDkpFvWMbDFoaiuRfFKcJG1xL%2BHRfUycy2tq0OOGAZ6WEA%3D%3D&crl=c
Putri, S. D. A., Ibrohim, M. O., & Budi, I. (2021a). Abusive Language and Hate Speech Detection for Indonesian-Local Language in Social Media Text. In Lecture Notes in Networks and Systems (Vol. 251). https://doi.org/10.1007/978-3-030-79757-7_9
Putri, S. D. A., Ibrohim, M. O., & Budi, I. (2021b). Abusive Language and Hate Speech Detection for Indonesian-Local Language in Social Media Text. Lecture Notes in Networks and Systems, 251. https://doi.org/10.1007/978-3-030-79757-7_9
Rawat, A., Kumar, S., & Samant, S. S. (2024). Hate speech detection in social media: Techniques, recent trends, and future challenges. Wiley Interdisciplinary Reviews: Computational Statistics, 16(2). https://doi.org/10.1002/wics.1648
Riyadi, S., Andriyani, A. D., Masyhur, A. M., Damarjati, C., & Solihin, M. I. (2023). Detection of Indonesian Hate Speech on Twitter Using Hybrid CNN-RNN. Proceeding - International Conference on Information Technology and Computing 2023, ICITCOM 2023, 352–356. https://doi.org/10.1109/ICITCOM60176.2023.10442041
Rohmawati, U. A. N., Sihwi, S. W., & Cahyani, D. E. (2018). SEMAR: An interface for Indonesian hate speech detection using machine learning. 2018 International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2018, 646–651. https://doi.org/10.1109/ISRITI.2018.8864484