fhdsl · laderast · Oct 16, 2025 · Oct 16, 2025
diff --git a/slides/images/inner-join.gif b/slides/images/inner-join.gif
diff --git a/slides/images/left-join.gif b/slides/images/left-join.gif
diff --git a/slides/images/omop1.png b/slides/images/omop1.png
diff --git a/slides/images/original-dfs.png b/slides/images/original-dfs.png
diff --git a/slides/lesson1_slides.html b/slides/lesson1_slides.html
diff --git a/slides/lesson1_slides.qmd b/slides/lesson1_slides.qmd
@@ -15,6 +15,8 @@ output-location: fragment
 
 Please [sign-up for an account at Posit Cloud](https://login.posit.cloud/register "https://login.posit.cloud/register") and accept our classroom invitation here: <https://posit.cloud/spaces/689711/join?access_code=8kse5IYlL4kHIqZvKaQ6mXp8IMibFayMa10I8Izn>
 
+Our course website: <https://intro-sql-fh.netlify.app/>
+
 ## Introductions
 
 -   Who am I?
@@ -41,11 +43,11 @@ Please [sign-up for an account at Posit Cloud](https://login.posit.cloud/regist
 
 . . .
 
--   
+-   Fundamentals of SQL query writing: filtering, joining, grouping.
 
 . . .
 
--   
+-   Not so much about building your own database and optimizing it.
 
 ## Content of the course
 
@@ -55,7 +57,7 @@ Please [sign-up for an account at Posit Cloud](https://login.posit.cloud/regist
 
 3.  \[No class week\]
 
-4.  Calculating new fields, `GROUP BY`, `CASE WHEN`, `HAVING`
+4.  Grouping and Aggregating variables
 
 5.  Subqueries, Views, **Pizza**
 
@@ -187,6 +189,10 @@ Procedure Occurrence table
 
 ![](../img/omop0.png){width="550"}
 
+## A short survey on your interest and background
+
+<https://forms.gle/YADmDmukRKmGk2KFA>
+
 ## Let's get started: connecting to the database
 
 ```{r, warning=FALSE}
@@ -278,6 +284,20 @@ SELECT person_id, gender_source_value, race_source_value
   WHERE year_of_birth < 2000
 ```
 
+## SQL Comparison Operators
+
+-   Equal: `=`
+
+-   Greater than: `>`
+
+-   Less than: `<`
+
+-   Greater than or equal to: `>=`
+
+-   Less than or equal to: `<=`
+
+-   Not equal to: `<>`
+
 ## Single quotes and `WHERE`
 
 Single quotes ('M') refer to values, and double quotes refer to columns ("person_id").

diff --git a/slides/lesson2_slides.html b/slides/lesson2_slides.html
diff --git a/slides/lesson2_slides.qmd b/slides/lesson2_slides.qmd
@@ -0,0 +1,268 @@
+---
+title: "Week 2: JOINs, More WHERE, Boolean Logic, ORDER BY"
+format: 
+  revealjs:
+    smaller: true
+    scrollable: true
+    echo: true
+    embed-resources: true
+output-location: fragment
+---
+
+## Table references
+
+In single table queries, it is usually unambiguous to the query engine which column and which table you need to query.
+
+However, when you involve multiple tables, it is important to know how to refer to a column in a specific table.
+
+. . .
+
+For example:
+
+```{r}
+library(DBI)
+
+con <- DBI::dbConnect(duckdb::duckdb(), 
+                      "../data/GiBleed_5.3_1.1.duckdb")
+```
+
+```{sql connection="con"}
+SELECT person.person_id, person.year_of_birth
+  FROM person
+```
+
+. . .
+
+Your turn to use table references:
+
+```{sql connection="con"}
+SELECT *
+  FROM procedure_occurrence
+  WHERE person_id = 1
+```
+
+## Entity Relationship Diagrams
+
+![](images/omop1.png)
+
+-   For each `person_id` in the `person` table, there may be duplicated `person_id`s in `procedure_occurrence` table, as a patient can have multiple procedures. This is a **one-to-many relationship**.
+
+-   Multiple elements of `procedure_concept_id` in the `procedure_occurrence` table may correspond to a single element of `concept_id` in the "concept" table. This is a **many-to-one relationship**.
+
+-   You can also have a **one-to-one relationship**.
+
+. . .
+
+[OMOP CDM (Common Data Model)](https://ohdsi.github.io/CommonDataModel/index.html).
+
+## Joins
+
+To set the stage, let's show two tables, `x` and `y`. We want to join them by the **keys**, which are represented by colored boxes in both of the tables.
+
+![](images/original-dfs.png)
+
+. . .
+
+In an `INNER JOIN`, we only retain rows that have elements that exist in both the `x` and `y` tables.
+
+![](images/inner-join.gif)
+
+## `INNER JOIN` syntax
+
+```{sql connection="con"}
+SELECT person.person_id, procedure_occurrence.procedure_occurrence_id 
+    FROM person
+    INNER JOIN procedure_occurrence
+    ON person.person_id = procedure_occurrence.person_id
+```
+
+. . .
+
+1.  `FROM person` and `INNER JOIN procedure_occurrence` specifies the tables to be joined.
+
+2.  `ON person.person_id = procedure_occurrence.person_id` specifies the columns from each table for keys.
+
+3.  Then, we `SELECT` for the columns we want to keep: `person.person_id, procedure_occurrence.procedure_occurrence_id`
+
+## Table References
+
+We can short-hand the table names via the `AS` statement:
+
+```{sql connection="con"}
+SELECT p.person_id, po.procedure_occurrence_id 
+    FROM person AS p
+    INNER JOIN procedure_occurrence AS po
+    ON p.person_id = po.person_id
+```
+
+## `LEFT JOIN`
+
+If a row exists in the left table, but not the right table, it will be replicated in the joined table, but have rows with `NULL` columns from the right table.
+
+![](images/left-join.gif)
+
+. . .
+
+We can see the difference between a `INNER JOIN` and `LEFT JOIN` by counting the number of rows kept after joining:
+
+```{sql}
+#| connection: "con"
+SELECT COUNT (*)
+    FROM person as p
+    INNER JOIN procedure_occurrence as po
+    ON p.person_id = po.person_id
+```
+
+. . .
+
+```{sql}
+#| connection: "con"
+SELECT COUNT (*)
+    FROM person as p
+    LEFT JOIN procedure_occurrence as po
+    ON p.person_id = po.person_id
+```
+
+This suggests that there are some unique `person_id`s in `person` table not found in the `person_id` of `procedure_occurrence` table.
+
+## Other kinds of `JOIN`s
+
+-   The `RIGHT JOIN` is identical to `LEFT JOIN`, except that the rows preserved are from the *right* table.
+-   The `FULL JOIN` retains all rows in both tables, regardless if there is a key match.
+-   `ANTI JOIN` is helpful to find all of the keys that are in the *left* table, but not the *right* table
+
+## Multiple `JOIN`s
+
+Can we do a triple join?
+
+![](images/omop1.png)
+
+Suppose that we want a table with `person.person_id`, `procedure_occurrence.procedure_occurrence_id`, and `concept.concept_name`.
+
+. . .
+
+Some suggested steps:
+
+1.  We first `INNER JOIN` `person` and `procedure_occurrence`, to produce an output table
+2.  We take this output table and `INNER JOIN` it with `concept`.
+
+## Using `JOIN` with `WHERE`
+
+Let's add an additional `WHERE` where we only want those rows that have the `concept_name` of 'Subcutaneous immunotherapy\`:
+
+```{sql connection="con"}
+SELECT p.person_id, po.procedure_occurrence_id, c.concept_name
+  FROM person AS p
+  INNER JOIN procedure_occurrence AS po
+  ON p.person_id = po.person_id
+  INNER JOIN concept AS c
+  ON po.procedure_concept_id = c.concept_id
+  WHERE c.concept_name = 'Subcutaneous immunotherapy';
+```
+
+## Revisiting `WHERE`: `AND` versus `OR`
+
+Revisiting `WHERE`, we can combine conditions with `AND` or `OR`.
+
+`AND` is always going to be more restrictive than `OR`, because our rows must meet two conditions.
+
+```{sql}
+#| connection: "con"
+SELECT COUNT(*)
+  FROM person
+  WHERE year_of_birth < 1980 
+  AND gender_source_value = 'M'
+```
+
+. . .
+
+On the other hand `OR` is more permissive than `AND`, because our rows must meet only one of the conditions.
+
+```{sql}
+#| connection: "con"
+SELECT COUNT(*)
+  FROM person
+  WHERE year_of_birth < 1980 
+  OR gender_source_value = 'M'
+```
+
+. . .
+
+There is also `NOT`, where one condition must be true, and the other must be false.
+
+```{sql}
+#| connection: "con"
+SELECT COUNT(*)
+  FROM person
+  WHERE year_of_birth < 1980 
+  AND NOT gender_source_value = 'M'
+```
+
+## `ORDER BY`
+
+`ORDER BY` lets us sort tables by one or more columns:
+
+```{sql}
+#| connection: "con"
+SELECT p.person_id, po.procedure_occurrence_id, po.procedure_date
+    FROM person as p
+    INNER JOIN procedure_occurrence as po
+    ON p.person_id = po.person_id
+    ORDER BY p.person_id;
+```
+
+. . .
+
+Once we sorted by `person_id`, we see that for every unique `person_id`, there can be multiple procedures! This suggests that there is a **one-to-many relationship** between `person` and `procedure_occurrence` tables.
+
+. . .
+
+We can `ORDER BY` multiple columns at once. Try ordering by `p.patient_id` and `po.procedure_date`...
+
+## Constraints and rules for Databases
+
+Some constraints we can require on columns of a table:
+
+-   Typed: such as `INTEGER`, `VARCHAR`
+-   `NOT NULL` - no values can have a `NULL` value.
+-   `UNIQUE` - all values must be unique.
+-   `PRIMARY KEY` - `NOT NULL` and `UNIQUE`.
+-   `FOREIGN KEY` - value must exist as a primary key in another table's field. The referenced table's field must be specified.
+-   `CHECK` - check the data type and conditions. One example would be our data shouldn't be before 1900.
+-   `DEFAULT` - default values are given if not provided.
+
+## Primary keys
+
+A `PRIMARY KEY` is required for any table, and cannot be `NULL` and must be unique. This gives an unique id for each entry of the table.
+
+. . .
+
+When we create tables in our database, we need to specify which column is a `PRIMARY KEY`:
+
+```         
+CREATE TABLE person (
+  person_id INTEGER PRIMARY KEY
+)
+```
+
+## Foreign keys
+
+`FOREIGN KEY` involves two or more tables. If a column is declared a `FOREIGN KEY`, then that key value must *exist* in a `REFERENCES` table as a primary key.
+
+. . .
+
+```         
+CREATE TABLE procedure_occurrence {
+  procedure_occurrence_id PRIMARY KEY,
+  person_id INTEGER REFERENCES person(person_id)
+  procedure_concept_id INTEGER REFERENCES concept(concept_id)
+}
+```
+
+## Always close the connection
+
+When we're done, it's best to close the connection with `dbDisconnect()`.
+
+```{r}
+dbDisconnect(con)
+```