Wang, Yanyan (2021) Multimodal Video Sentiment Analysis Using Audio and Text Data. Journal of Advances in Mathematics and Computer Science, 36 (7). pp. 30-37. ISSN 2456-9968
1615-Article Text-3157-1-10-20221012.pdf - Published Version
Download (778kB)
Abstract
Nowadays, video sharing websites are becoming more and more popular, such as YouTube, Tiktok. A good way to analyze a video’s sentiment would greatly improve the user experience and would help with designing better ranking and recommendation systems [1,2]. In this project, we used both acoustic information of a video to predict its sentiment levels. For audio data, we leverage transfer learning technique and use a pre-trained VGGish model as a features extractor to analyze abstract audio embeddings [6]. We then used MOSI dataset [5] to further fine-tune the VGGish model and achieved a test accuracy of 90% for binary classification. For text data, we compared traditional bag-of-word model to LSTM model. We found that LSTM model with word2vec outperformed bag-of-word model and achieved a test accuracy of 84% for binary classification.
Item Type: | Article |
---|---|
Subjects: | Institute Archives > Mathematical Science |
Depositing User: | Managing Editor |
Date Deposited: | 15 Mar 2023 08:58 |
Last Modified: | 21 Mar 2024 04:16 |
URI: | http://eprint.subtopublish.com/id/eprint/1816 |