Paper ID: 34
This study introduces a unified multimodal fusion model for hate speech detection (HD) and target detection (TD) using the CrisisHateMM dataset of 4,700 annotated text-embedded images. Combining TwHIN-BERT for text and Vision Transformer (ViT) for visuals, the model achieved state-of-the-art F1-scores of 0.806 (HD) and 0.683 (TD). Future work focuses on enhancing model performance and expanding datasets to strengthen online hate speech mitigation during crises.
Asrarul Hoque Eusha
Salman Farsi
Mohammad Shamsul Arefin
Department of CSE, Chittagong University of Engineering and Technology, Chattogram-4349, Bangladesh
Conference
IEEE CS BDC SYMPOSIUM 2024
Date
Nov 22-23, 2024
Location
Jagannath University, Dhaka, Bangladesh
Publisher
IEEE Computer Society Bangladesh Chapter