diff --git a/documentation/README.md b/documentation/README.md index fe720ea..c01e518 100644 --- a/documentation/README.md +++ b/documentation/README.md @@ -34,7 +34,7 @@ The project goal is to create a tool to detect nuances in climate news articles. ### 1. Data Collection -We explored Kaggle, Open Source Datasets, GitHub, and scraped data from New York Times and The Guardian((#add the link to scrappers here). It was challenging to find consistent datasets that fulfilled our project scope requirements for running sentiment analysis on climate news. We proceeded to use Huggingface expert-labeled datasets. Finally, we assumed that "risk" and "opportunity" should refer to negative and positive news. +We explored Kaggle, Open Source Datasets, GitHub, and scraped data from New York Times and The Guardian((#add the link to scrappers here). Initially, we wrote scripts to web-crawl and gather headlines of climate news articles from news a, however it was challenging to find consistent datasets that fulfilled our project scope requirements for running sentiment analysis on climate news and manually review sentiment to each statement. Hence, we proceeded to use Huggingface expert-labeled datasets. Finally, we assumed that "risk" and "opportunity" should refer to negative and positive news. ### 2. Data Preprocessing @@ -47,11 +47,20 @@ Some of the data techniques used were: ### 3. Data Visualization -Word Clouds were used to visualize words associated with positive and negative news. +- Word Clouds were used to visualize words associated with positive and negative news. +- Scatter plot was used to visualize the average sentiments in relation to their frequencies for the commonly-used words. +- Pie Chart was used to visualize the ratio of keywords with positive, neutral and negative average sentiments. +- Bar Charts were used to see the word counts and frequency distribution for each sentiment, positive, neutral and negative. ### 4. Sentiment Analysis Model Exploration & Development -We explored different techniques such as CountVectorizer, TF-IDF Vectorizer, and different models such as Support Vector Machine, Logistic Regression, Naive Bayes, and LSTM Neural Network. +We explored different techniques and models such as +- CountVectorizer +- TF-IDF Vectorizer +- Support Vector Machine (SVM) +- Logistic Regression +- Naive Bayes +- LSTM Neural Network. ### 5. User Interface