Real-Time Sentiment Analysis

Project Overview

This project aims to perform real-time analysis of public sentiment toward AI. The primary sentiment categories we focused on are Hope, Fear, and Neutral.

Data Pipeline Architecture

Project Setup

To run the project, follow these steps:

Create a .env file inside the config/ folder containing your Reddit credentials. You can use the provided config/template.env file as a reference.
Run the following command to start the project:
```
docker-compose up --build -d
```

Troubleshooting

cassandra.cluster.NoHostAvailable: If you encounter this kind of error while starting the spark-streaming container, just restart it again. It's just waiting for Cassandra to be up and running.

DistilBERT Model

DistilBERT, a lightweight transformer model, is used for sentiment analysis due to its efficiency and multi-encoder architecture. However, the primary goal of this project is to build a scalable and efficient data streaming infrastructure rather than focusing on model performance. The dataset used for fine-tuning the model is not of high quality. All data is stored in the data/ folder.

Classification Report:

Class	Precision	Recall	F1-Score	Support
Neutral	0.71	0.65	0.68	314
Hope	0.76	0.76	0.76	304
Fear	0.80	0.85	0.82	327
Accuracy	-	-	0.76	945
Macro Avg	0.75	0.75	0.75	945
Weighted Avg	0.75	0.76	0.75	945

Confusion Matrix:

	Neutral	Hope	Fear
Neutral	204	61	49
Hope	50	232	22
Fear	35	14	278

Grafana Dashboard

During approximately one hour of real-time streaming data, from 15:30 to 16:40, we observed that a total of 27,732 comments were processed in this experiment. These comments were classified as follows:

956 expressed fear toward AI.
2,363 expressed hope in AI.
18,413 were neutral.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
config		config
data		data
models		models
notebooks		notebooks
src		src
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-Time Sentiment Analysis

Project Overview

Data Pipeline Architecture

Project Setup

Troubleshooting

DistilBERT Model

Classification Report:

Confusion Matrix:

Grafana Dashboard

About

Uh oh!

Releases

Packages

Languages

BALK-03/Real-Time-HFN-Analysis

Folders and files

Latest commit

History

Repository files navigation

Real-Time Sentiment Analysis

Project Overview

Data Pipeline Architecture

Project Setup

Troubleshooting

DistilBERT Model

Classification Report:

Confusion Matrix:

Grafana Dashboard

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages