Graph Neural Networks for Enhancing Sentiment Analysis

Abstract

In this study, we want to introduce a novel approach to sentiment analysis by integrating Graph Neural Networks (GNNs), specifically Graph Attention Networks (GAT), with traditional Natural Language Processing (NLP) tech- niques.

By focusing on the challenges of interpreting long-range dependen- cies within text, particularly in the context of social media interactions, we propose a model that leverages the relational and textual information present in data from college SubReddit forums before and during the COVID-19 pan- demic.

Our methodology employs RoBERTa for text em- bedding, followed by a GAT to incorporate the structural relationships be- tween messages, enhancing the sentiment analysis capability beyond the lim- itations of conventional models.

The results demonstrate a statistically signif- icant improvement in identifying and relating sentiments across distant text parts, underscoring the potential of GNNs in capturing complex interactions within text data. This research contributes to the evolving field of sentiment analysis by showcasing the effectiveness of combining GNNs with existing NLP frameworks, offering insights into the nuanced emotional landscape of digital communications during a global crisis.

Introduction

In the dynamic realm of NLP, sentiment analysis stands as a cornerstone for unraveling the intricate emotional fabric of text. However, the traditional models often falter at the complex dance of sentiments in social media's vast expanse. This study introduces a novel confluence of Graph Neural Networks, particularly Graph Attention Networks, with the rich linguistic insights of NLP techniques such as RoBERTa. Centered on the vibrant dialogues within college SubReddit forums amid the COVID-19 period, our method transcends conventional sentiment analysis. It merges NLP's semantic depth with GNN's contextual acumen, aiming to illuminate the nuanced emotional landscape of digital discourse in these unprecedented times.

data_example

Dataset

The study analyzes social media data sourced from Reddit, focusing specifically on SubReddits associated with different colleges. This dataset covers two distinct time periods: 2019, representing the pre-pandemic landscape, and 2020, reflecting sentiments amidst the COVID-19 pandemic.

Selection criteria for the colleges included in the study were based on a variety of factors, such as geographic location, size, and their approach to in-person learning during the pandemic. This ensured a diverse representation of college communities.

The dataset consists of 165,570 messages, including original posts, comments, and their respective replies. To enhance the reliability of the dataset, a significant subset of the data underwent manual labeling across various sentiment categories. This manual labeling process ensured a balanced and accurately labeled dataset for training and testing machine learning models.

data_example

Model

By integrating these advanced techniques, the model aims to accurately classify the sentiment expressed in social media messages, including posts, comments, and replies. The model undergoes rigorous testing and validation to ensure its effectiveness in capturing the nuanced sentiment variations present in the dataset.

The nodes for our graph is a single comment. We use the RoBERTa Embedding to get deep textual understanding, enriching our analysis with nuanced language representation.

The edge that connect the nodes are directed graphs where nodes symbolize individual comments, and edges reflect the direct interactions, forming the backbone of our analysis.

Architecture.jpg

We proposes a machine learning model to analyze sentiment in social media data. This model builds upon existing methodologies in sentiment analysis and natural language processing. Specifically, it leverages techniques such as sentiment lexicons, neural networks, and deep learning architectures.

GAT Implementation our GAT model dynamically assigns weights to node interactions, facilitating nuanced sentiment classification

model_architecture.png

Result

When we combine Graph Neural Networks (GNN) with RoBERTa, a type of advanced language processing technology, we sometimes see better results than when we use RoBERTa by itself. This combination is like putting together a team where each member brings a unique strength, leading to a more effective outcome. Moreover, when we enhance our models with GAT, they often do a better job than the usual methods in understanding the sentiment or emotions in text. This is particularly true for identifying complex connections or the broader context of conversations, which is crucial for accurately interpreting what's being said. This success highlights how well these GAT-enhanced models work with data that involves relationships and connections, proving their value in making sense of intricate information flows.

Model Accuracy for 2019
Model Metric UCLA UCSD UCB UofM Harvard Columbia Overall
HAN CAR 0.48 0.50 0.54 0.58 0.46 0.50 0.498
F1 0.50 0.51 0.49 0.59 0.43 0.49 0.482
BERT CAR 0.46 0.52 0.64 0.56 0.42 0.46 0.497
F1 0.53 0.56 0.65 0.61 0.49 0.49 0.534
ESM CAR 0.54 0.54 0.48 0.42 0.56 0.52 0.505
F1 0.49 0.50 0.44 0.36 0.56 0.51 0.489
Model Accuracy for 2020
Model Metric UCLA UCSD UCB UofM Harvard Columbia Overall
HAN CAR 0.60 0.40 0.40 0.48 0.48 0.56 0.498
F1 0.62 0.38 0.44 0.41 0.38 0.52 0.481
BERT CAR 0.46 0.46 0.44 0.50 0.48 0.56 0.497
F1 0.53 0.53 0.46 0.44 0.52 0.59 0.533
ESM CAR 0.40 0.64 0.52 0.51 0.48 0.50 0.505
F1 0.32 0.63 0.43 0.55 0.54 0.51 0.488

Discussion

Integrating GAT with NLP technologies significantly improves our ability to analyze sentiments and relationships within digital communications. This innovative combination goes beyond simple text interpretation, offering a deeper understanding of the nuances in online sentiments and the intricate web of social connections. By effectively mapping out the complex network of interactions and emotions expressed across various digital platforms, these integrated models provide a more nuanced perspective of online communities and individual behaviors.

Looking ahead, the focus is on expanding the models' versatility to cover a broader range of applications, from social media analysis to customer feedback in different industries. Efforts are underway to refine these models for even greater accuracy, incorporating a wider array of data sources, including images and videos, and paying closer attention to subtle linguistic cues like sarcasm or regional dialects. By harnessing these advancements, the goal is to unlock a new level of insight into digital human interaction, making it possible to better understand and respond to the evolving landscape of online communication.

Acknowledgments

Our heartfelt thanks go to our mentors and the Halıcıoğlu Data Science Institute for their unwavering support. This project’s success is a testament to our collaborative spirit and the insightful guidance provided at every step.

BibTeX


      @article{park2021nerfies,
        author    = {Park, Keunhong and Sinha, Utkarsh and Barron, Jonathan T. and Bouaziz, Sofien and Goldman, Dan B and Seitz, Steven M. and Martin-Brualla, Ricardo},
        title     = {Nerfies: Deformable Neural Radiance Fields},
        journal   = {ICCV},
        year      = {2021},
      }

      @article{yan2021sentiment,
        author    = {Yan, Tian and Liu, Fang},
        title     = {Sentiment Analysis and Effect of COVID-19 Pandemic using College SubReddit Data},
        journal   = {arXiv preprint arXiv:2112.04351},
        year      = {2021}, 
      }