Multimodal Hate Event Detection in Russia-Ukraine War

Paper ID: 34

Abstract

This study introduces a unified multimodal fusion model for hate speech detection (HD) and target detection (TD) using the CrisisHateMM dataset of 4,700 annotated text-embedded images. Combining TwHIN-BERT for text and Vision Transformer (ViT) for visuals, the model achieved state-of-the-art F1-scores of 0.806 (HD) and 0.683 (TD). Future work focuses on enhancing model performance and expanding datasets to strengthen online hate speech mitigation during crises.

Keywords

Multimodal DataHate Speech DetectionConflict Analysis

Authors & Affiliations

Asrarul Hoque Eusha

Salman Farsi

Mohammad Shamsul Arefin

Department of CSE, Chittagong University of Engineering and Technology, Chattogram-4349, Bangladesh

Publication Details

Conference

IEEE CS BDC SYMPOSIUM 2024

Date

Nov 22-23, 2024

Location

Jagannath University, Dhaka, Bangladesh

Publisher

IEEE Computer Society Bangladesh Chapter