Multimodal Video Sentiment Analysis Using Audio and Text Data

Wang, Yanyan (2021) Multimodal Video Sentiment Analysis Using Audio and Text Data. Journal of Advances in Mathematics and Computer Science, 36 (7). pp. 30-37. ISSN 2456-9968

[thumbnail of 1615-Article Text-3157-1-10-20221012.pdf] Text
1615-Article Text-3157-1-10-20221012.pdf - Published Version

Download (778kB)

Abstract

Nowadays, video sharing websites are becoming more and more popular, such as YouTube, Tiktok. A good way to analyze a video’s sentiment would greatly improve the user experience and would help with designing better ranking and recommendation systems [1,2]. In this project, we used both acoustic information of a video to predict its sentiment levels. For audio data, we leverage transfer learning technique and use a pre-trained VGGish model as a features extractor to analyze abstract audio embeddings [6]. We then used MOSI dataset [5] to further fine-tune the VGGish model and achieved a test accuracy of 90% for binary classification. For text data, we compared traditional bag-of-word model to LSTM model. We found that LSTM model with word2vec outperformed bag-of-word model and achieved a test accuracy of 84% for binary classification.

Item Type: Article
Subjects: Institute Archives > Mathematical Science
Depositing User: Managing Editor
Date Deposited: 15 Mar 2023 08:58
Last Modified: 21 Mar 2024 04:16
URI: http://eprint.subtopublish.com/id/eprint/1816

Actions (login required)

View Item
View Item