fhdsl · laderast · Nov 6, 2025 · Nov 6, 2025
diff --git a/concepts.qmd b/concepts.qmd
@@ -95,7 +95,7 @@ SQL is short for **S**tructured **Q**uery **L**anguage. It is a standardized lan
 SQL lets us do various operations on data. It contains various *clauses* which let us manipulate data:
 
 | Priority | Clause     | Purpose                                                        |
-|------------|------------|------------------------------------------------|
+|---------------|---------------|-------------------------------------------|
 | 1        | `FROM`     | Choose tables to query and specify how to `JOIN` them together |
 | 2        | `WHERE`    | Filter tables based on criteria                                |
 | 3        | `GROUP BY` | Aggregates the Data                                            |
@@ -109,7 +109,7 @@ We do not use all of these clauses when we write a SQL Query. We only use the on
 Oftentimes, we really only want a summary out of the database. We would probably use the following clauses:
 
 | Priority | Clause     | Purpose                                                        |
-|------------|------------|------------------------------------------------|
+|---------------|---------------|-------------------------------------------|
 | 1        | `FROM`     | Choose tables to query and specify how to `JOIN` them together |
 | 2        | `WHERE`    | Filter tables based on criteria                                |
 | 3        | `GROUP BY` | Aggregates the Data                                            |

diff --git a/dbplyr-example.qmd b/dbplyr-example.qmd
@@ -0,0 +1,69 @@
+---
+title: "dbplyr demo"
+format: html
+editor: visual
+---
+
+## dbplyr!
+
+*"Okay, cool, thanks for Intro to SQL. Now can I go back to coding in R?" -* 🧑‍🌾
+
+```{r}
+library(tidyverse)
+library(dbplyr)
+```
+
+```{r}
+library(duckdb)
+library(DBI)
+
+con <- DBI::dbConnect(
+  duckdb::duckdb(), 
+  "data/GiBleed_5.3_1.1.duckdb"
+)
+```
+
+Specify the dataframe of interest:
+
+```{r}
+person_db <- tbl(con, "person")
+```
+
+Now, you can query it. By default you get the first 1000 entries.
+
+```{r}
+person_db |> 
+  select(person_id, birth_datetime, gender_source_value) |>
+  filter(gender_source_value == "F")
+```
+
+You can save your query as a variable:
+
+```{r}
+cool_query = person_db |> 
+  select(person_id, birth_datetime, gender_source_value) |>
+  filter(gender_source_value == "F")
+```
+
+You can see what it is translating into SQL:
+
+```{r}
+cool_query |> show_query()
+```
+
+Finally, when you are ready to fully query it:
+
+```{r}
+cool_query_result = cool_query |> collect()
+```
+
+Do R things:
+
+```{r}
+cool_query_result$year <- as.numeric(sub("-.*", "", cool_query_result$birth_datetime))
+ggplot(cool_query_result) + aes(x = year) + geom_histogram() + theme_bw()
+```
+
+Full Guide here: <https://dbplyr.tidyverse.org/articles/dbplyr.html>
+
+-   Yes, you can even do joins: <https://dbplyr.tidyverse.org/reference/join.tbl_sql.html>
diff --git a/week4-exercises.qmd b/week4-exercises.qmd
@@ -16,7 +16,7 @@ con <- DBI::dbConnect(duckdb::duckdb(),
 
 ## Subquery in `SELECT`
 
-1. Fill in the blank in the subquery below to find each patient's demographic data along with the **total number of procedures** they have had. Note that this query makes use of the `person` table as well as the `procedure_occurrence` table.
+1.  Fill in the blank in the subquery below to find each patient's demographic data along with the **total number of procedures** they have had. Note that this query makes use of the `person` table as well as the `procedure_occurrence` table.
 
 ```{sql connection="con"}
 SELECT 
@@ -46,7 +46,7 @@ SELECT
   procedure_datetime,
   (SELECT 
     DATE_DIFF(
-      ______, ______, DATE '2025-03-07'
+      'month', ______, DATE '2025-11-7'
     )
   ) AS procedure_time_to_today
 FROM 
@@ -55,7 +55,7 @@ FROM
 
 ## Subquery in `WHERE`
 
-Collect patient demographic data for all patients who have an occurrence of a condition with id = "40481087":
+Collect patient demographic data for all patients who have an occurrence of a `condition_occurrence_id` = "40481087":
 
 ```{sql connection="con"}
 SELECT 
@@ -78,29 +78,27 @@ WHERE
 
 ```
 
-## Creating a view
+## Challenge: Creating a view (using `DATEDIFF` in a subquery)
 
-4. Create a view for senior citizen demographics, where we collect demographics for patients born in or before 1960.
+5.  Create a view for senior citizen procedures, where we collect procedure occurrences for all patients aged \>= 50 at the time of their procedure.
 
-```{sql}
-#| connection: "con"
-CREATE VIEW senior_demographics AS
-SELECT
-  person_id, 
-  birth_datetime, 
-  gender_source_value, 
-  race_source_value, 
-  ethnicity_source_value
-FROM person 
-WHERE
-  _______ >= '1960';
-```
+Break it down: Create a query for patients aged \>= 50. You will need to use the `person` table and use `DATE_DIFF` function on the `birth_datetime` column.
 
-## Challenge: Creating a view (using `DATEDIFF` in a subquery)
+```{sql connection="con"}
+SELECT ---
+FROM ---
+WHERE DATE_DIFF('year', ----, DATE '2024-03-07') >= ---
+```
 
-5. Create a view for senior citizen procedures, where we collect procedure occurrences for all patients aged \>= 50 at the time of their procedure
+Then, write the outer query of the view via the `person` table filtering where the `person_id` corresponds to your query above:
 
 ```{sql}
 #| connection: "con"
+CREATE VIEW senior_citizen_procedures AS
 
+
+```
+
+```{sql connection="con"}
+SELECT * FROM senior_citizen_procedures 
 ```