Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions documentation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The project goal is to create a tool to detect nuances in climate news articles.

### 1. Data Collection

We explored Kaggle, Open Source Datasets, GitHub, and scraped data from New York Times and The Guardian((#add the link to scrappers here). It was challenging to find consistent datasets that fulfilled our project scope requirements for running sentiment analysis on climate news. We proceeded to use Huggingface expert-labeled datasets. Finally, we assumed that "risk" and "opportunity" should refer to negative and positive news.
We explored Kaggle, Open Source Datasets, GitHub, and scraped data from New York Times and The Guardian((#add the link to scrappers here). Initially, we wrote scripts to web-crawl and gather headlines of climate news articles from news a, however it was challenging to find consistent datasets that fulfilled our project scope requirements for running sentiment analysis on climate news and manually review sentiment to each statement. Hence, we proceeded to use Huggingface expert-labeled datasets. Finally, we assumed that "risk" and "opportunity" should refer to negative and positive news.

### 2. Data Preprocessing

Expand All @@ -47,11 +47,20 @@ Some of the data techniques used were:

### 3. Data Visualization

Word Clouds were used to visualize words associated with positive and negative news.
- Word Clouds were used to visualize words associated with positive and negative news.
- Scatter plot was used to visualize the average sentiments in relation to their frequencies for the commonly-used words.
- Pie Chart was used to visualize the ratio of keywords with positive, neutral and negative average sentiments.
- Bar Charts were used to see the word counts and frequency distribution for each sentiment, positive, neutral and negative.

### 4. Sentiment Analysis Model Exploration & Development

We explored different techniques such as CountVectorizer, TF-IDF Vectorizer, and different models such as Support Vector Machine, Logistic Regression, Naive Bayes, and LSTM Neural Network.
We explored different techniques and models such as
- CountVectorizer
- TF-IDF Vectorizer
- Support Vector Machine (SVM)
- Logistic Regression
- Naive Bayes
- LSTM Neural Network.

### 5. User Interface

Expand Down