Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ website:
- section: "Sentiment Analysis"
contents:
- href: chapters/3.SentimentAnalysis/introduction.qmd
text: Introduction to Sentiment Analysis
text: What is Sentiment Analysis?
- href: chapters/3.SentimentAnalysis/polarity.qmd
text: Polarity Classification
- href: chapters/3.SentimentAnalysis/emotion.qmd
Expand Down
21 changes: 10 additions & 11 deletions chapters/3.SentimentAnalysis/emotion.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "Emotion Detection"
editor: visual
---

Emotion detection is another NLP technique aimed at identifying and quantifying human emotions expressed in text, which builds directly on traditional sentiment polarity analysis focusing on capturing more nuanced emotional states. While polarity classification identifies whether a text expresses positive, negative, or neutral sentiment, it does not capture the specific type of emotion behind that sentiment. For example, two negative texts could express very different emotionsone might convey anger, while another reflects sadness. By extending polarity into multiple emotional dimensions, emotion detection provides more granular and more actionable insights into how people truly feel.
Emotion detection is another NLP technique aimed at identifying and quantifying human emotions expressed in text, which builds directly on traditional sentiment polarity analysis focusing on capturing more nuanced emotional states. While polarity classification identifies whether a text expresses positive, negative, or neutral sentiment, it does not capture the specific type of emotion behind that sentiment. For example, two negative texts could express very different emotions: one might convey anger, while another reflects sadness. By extending polarity into multiple emotional dimensions, emotion detection provides more granular and more actionable insights into how people truly feel.

We will use the `syuzhet` package ([more info](https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html)) to to help us classify emotions detected in our dataset. The name “syuzhet” is inspired by the work of Russian Formalists Victor Shklovsky and Vladimir Propp, who distinguished between two aspects of a narrative: the fabula and the syuzhet. The fabula represents the chronological sequence of events, while the syuzhet refers to the way these events are presented or structured; the narrative’s technique or “device.” In other words, syuzhet focuses on how the story (fabula) is organized and conveyed to the audience.

Expand All @@ -22,18 +22,17 @@ You may explore NRC's lexicon Tableau dashboard to explore words associated with
```{=html}
<iframe width="780" height="500" src="https://public.tableau.com/views/NRC-Emotion-Lexicon-viz1/NRCEmotionLexicon-viz1?:embed=y&:loadOrderID=0&:display_count=no&:showVizHome=no" title="NRC Lexicon Interactive Visualization"></iframe>
```

Now that we have a better understanding of this package, let's get back to business and perform emotion detection to our data.

#### Emotion Detection with Syuzhet's NRC Lexicon
### Emotion Detection with Syuzhet's NRC Lexicon

##### Detecting Emotions per Comment/Sentence
#### Detecting Emotions per Comment/Sentence

``` r
sentences <- get_sentences(comments$comments)
```

##### Compute Emotion Scores per Sentence
#### Compute Emotion Scores per Sentence

``` r
emotion_score <- get_nrc_sentiment(sentences)
Expand All @@ -43,7 +42,7 @@ The `get_nrc_sentiment()` function assigns emotion and sentiment scores (based o

![](images/emotions_scores-dataframe.png)

##### Review Summary of Emotion Scores
#### Review Summary of Emotion Scores

Let's now compute basic statistics (min, max, mean, etc.) for each emotion column and get an overview of how frequent or strong each emotion is on our example dataset.

Expand All @@ -59,7 +58,7 @@ Based on the results the overall emotion in these comments leans heavily toward

On the flip side, **Disgust** was the rarest emotion, with the lowest average (0.145). It's also worth noting that while Sadness and Trust are the most *common*, a few comments really went off the rails with **Trust (47.000), Anger (44.000)**, and **Fear (37.000)**, hitting the highest extreme scores.

##### Regroup with comments and IDs
#### Regroup with comments and IDs

After computing scores for emotions, we want to link them back to its **original comment and ID**.

Expand All @@ -70,7 +69,7 @@ emotion_data <- bind_cols(comments, emotion_score)

`bind_cols()` merges the original `comments` data frame with the new `emotion_score` table.

##### Summarize Emotion Counts Across All Sentences
#### Summarize Emotion Counts Across All Sentences

Now, let's count **how many times each emotion appears** overall.

Expand All @@ -88,7 +87,7 @@ emotion_summary <- emotion_data %>%

![](images/emotion-counts.png){width="194"}

##### Plot the Overall Emotion Distribution
#### Plot the Overall Emotion Distribution

``` r
ggplot(emotion_summary, aes(x = emotion, y = count, fill = emotion)) +
Expand All @@ -103,7 +102,7 @@ ggplot(emotion_summary, aes(x = emotion, y = count, fill = emotion)) +

![](images/barchart-emotions.png)

##### Add a “Season” Variable (Grouping) and Summarize
#### Add a “Season” Variable (Grouping) and Summarize

Let's now add a new column called `season` by looking at the ID pattern — for example, `s1_` means season 1 and `s2_` means season 2. This makes it easy to compare the emotional tone across seasons.

Expand All @@ -124,7 +123,7 @@ emotion_by_season <- emotion_seasons %>%
)
```

##### Plotting the Data
#### Plotting the Data

Comparing emotions by season:

Expand Down
4 changes: 2 additions & 2 deletions chapters/3.SentimentAnalysis/introduction.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Introduction to Sentiment Analysis"
title: "What is Sentiment Analysis?"
---

Now that we have completed all the key preprocessing steps and our example dataset is in much better shape, we can finally proceed with sentiment analysis.
Expand All @@ -23,7 +23,7 @@ Our analysis pipeline will follow a two-step approach. First, we will compute ba
Let’s start by installing and loading the necessary packages, then bringing in the cleaned dataset so we can begin our sentiment analysis. We will discuss the role of each package in the next episodes.

``` r
# Install packages (remove comments for packages you might have skipped)
# Install packages (remove comments for packages you might have skipped in previous episodes)
install.packages("sentimentr")
install.packages("syuzhet")
# install.packages("dplyr")
Expand Down
20 changes: 10 additions & 10 deletions chapters/3.SentimentAnalysis/polarity.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,21 @@ Words like “but,” “however,” and “although” also influence the senti

With this approach, we can explore more confidently whether the show’s viewers felt positive, neutral, or negative about it.

#### Computing Polarity with Sentiment R (Valence Sifters Capability)
### Computing Polarity with Sentiment R (Valence Sifters Capability)

##### Calculating sentiment scores
#### Calculating sentiment scores

Here we’re using the **`sentiment_by()`** function which looks at each comment and calculates a **sentiment score** representing how positive or negative comments are.
Here we’re using the **`sentiment_by()`** function which looks at each comment and calculates a **sentiment score** representing how positive or negative comments are. Let's enter the following code to select all the values contained in the comments column:

``` r
sentiment_scores <- sentiment_by(comments$comments)
```

![Sentiment Scores Output](images/sentiment-scores.png){width="342"}
![Sentiment Scores Output](images/sentiment-scores.png){width="418"}

So after running this, we get a new object called `sentiment_scores` with the average sentiment for every comment. Can you guess why the SD column is empty? A single data point (sentence/row) does not have a standard deviation by itself.

##### Adding those scores back to our dataset
#### Adding those scores back to our dataset

Now we’re using the **`dplyr`** package to make our dataset more informative. We take our `comments` dataset, and with **`mutate()`**, we add two new columns: `score` and `sentiment label`. The little rule inside **`case_when()`** decides what label to give. The small buffer around zero (±0.1) helps us avoid overreacting to tiny fluctuations.

Expand All @@ -55,17 +55,17 @@ Let's now take a look at the `sentiment_scores` data frame:

![Sentiment Scores with Polarity Results](images/polarity-scores.png)

To get a sense of the overall mood of our dataset let's run:
To get a sense of the overall "mood" of our dataset let's run:

``` r
table(polarity$sentiment_label)
```

![Overall Polarity Count](images/overall-polarity-count.png){width="363"}
![Polarity Count](images/overall-polarity-count.png){width="363"}

Overall, the majority of viewers reacted positively to the show, with positive opinions more than double the negative ones, indicating a generally favorable reception. However, this is only part of the story—positive sentiment can range from mildly favorable to very enthusiastic. To better visualize the full distribution of opinions, a histogram is presented below.

#### Plotting Scores
### Plotting Scores

Next, let's plot some results and histograms to check the distribution for the scores:

Expand All @@ -77,9 +77,9 @@ ggplot(polarity, aes(x = score)) +
labs(title = "Sentiment Score Distribution", x = "Average Sentiment", y = "Count")
```

![Polarity Distribution](images/histogram-polarity.png){width="529"}
![Polarity Distribution](images/histogram-polarity.png){width="699"}

This histogram suggests that the overall sentiment toward the Severance show was **mostly neutral to slightly positive**. This suggests that most viewers are expressing their opinions in a **measured, nuanced, or factual** manner, rather than with intense emotional language (either extremely positive or negative).
This histogram suggests that the overall sentiment toward the Severance show was **mostly neutral to slightly positive**. This suggests that most viewers are expressing their opinions without using intense emotional language (either extremely positive or negative).

We can also break the data down by season to compare how audience opinions vary over each season finale:

Expand Down