From ebb93b3f87fa8742a8b65048281d2800e0dd856c Mon Sep 17 00:00:00 2001 From: vsriram24 Date: Mon, 24 Feb 2025 12:39:34 -0800 Subject: [PATCH 1/3] Week 4 ex first draft --- week4-exercises.qmd | 106 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 106 insertions(+) create mode 100644 week4-exercises.qmd diff --git a/week4-exercises.qmd b/week4-exercises.qmd new file mode 100644 index 0000000..d538edb --- /dev/null +++ b/week4-exercises.qmd @@ -0,0 +1,106 @@ +--- +title: "Week 4 Exercises" +--- + +We'll first connect to the database: + +```{r} +#| context: setup +library(duckdb) +library(DBI) +library(DiagrammeR) +con <- DBI::dbConnect(duckdb::duckdb(), + "data/GiBleed_5.3_1.1.duckdb") + +``` + +## Subquery in `SELECT` + +Fill in the blank in the query below to find each patient's demographic data along with the **total number of procedures** they have had. + +```{sql connection="con"} +SELECT + person_id, + gender_source_value, + race_source_value, + (SELECT + COUNT(*) + FROM + procedure_occurrence + WHERE + person.person_id = procedure_occurrence.person_id + ) AS number_of_procedures +FROM + person; +``` + +2. Fill in the blank in the query below to dynamically calculate the **number of months** between the **procedure start date** and today for all procedures from the `procedure_occurrence` table + +```{sql connection="con"} +#| eval: false +SELECT + person_id, + visit_occurrence_id, + procedure_occurrence_id, + procedure_concept_id, + procedure_datetime, + (SELECT + DATE_DIFF( + 'month', procedure_datetime, DATE '2025-03-07' + ) + ) AS procedure_time_to_tody +FROM + procedure_occurrence; +``` + + + +## Subquery in `WHERE` + +Count the number of cases for `procedure_occurrence` with the following criteria: + +```{sql connection="con"} +SELECT + person_id, + birth_datetime, + gender_source_value, + race_source_value, + ethnicity_source_value +FROM + person +WHERE + person_id IN ( + SELECT + person_id + FROM + condition_occurrence + WHERE + condition_concept_id == '40481087' + ); + +``` + + +## Creating a view (with `DATEDIFF`) + +Create a view for senior citizen occurrences, where we collect procedure occurrences for all patients aged >= 50 at the time of their procedure + +```{sql} +#| connection: "con" +CREATE VIEW senior_procedures AS +SELECT * +FROM procedure_occurrence +WHERE procedure_occurrence.person_id IN ( + SELECT + person_id + FROM + person + WHERE + DATE_DIFF( + 'year', + person.birth_datetime, + procedure_occurrence.procedure_datetime + ) >= 50 +); +``` + From 9394603ae63a518e68d41ec2d59343856c121e9c Mon Sep 17 00:00:00 2001 From: vsriram24 Date: Fri, 28 Feb 2025 09:18:21 -0800 Subject: [PATCH 2/3] Updates to week 4 presentation based on feedback from 2/27 --- week4.qmd | 61 ++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 38 insertions(+), 23 deletions(-) diff --git a/week4.qmd b/week4.qmd index 46fc81e..97c88d4 100644 --- a/week4.qmd +++ b/week4.qmd @@ -49,10 +49,20 @@ Note that in our example, we explicitly cast our two dates as `DATE` variables - 1. **What do you think happens if we swap the order of dates in the `DATEDIFF` command?** -```{sql connection="con", output.var="person_age"} +```{r} #| eval: false #| include: false -SELECT DATE_DIFF('day', DATE '2024-03-07', DATE '2024-01-01') +sql_statement <- "SELECT DATE_DIFF('day', DATE '2024-01-01', DATE '2024-03-07')" + +out1 <- DBI::dbGetQuery(con, sql_statement) +``` + +```{r} +#| eval: false +#| include: false +sql_statement <- "SELECT DATE_DIFF('day', DATE '2024-03-07', DATE '2024-01-01')" + +out2 <- DBI::dbGetQuery(con, sql_statement) ``` ### Example: Using a Subquery in the `SELECT` Clause @@ -95,7 +105,9 @@ SELECT condition_concept_id, condition_start_date, condition_end_date, - (_____________________) AS condition_time_span + (SELECT + DATE_DIFF(_____, _____, _____) + ) AS condition_time_span FROM condition_occurrence; ``` @@ -127,7 +139,7 @@ Here's another great example from [The Data School](https://dataschool.com/how-t ![](https://dataschool.com/assets/images/how-to-teach-people-sql/subqueries/subqueries_7.gif) -#### A brief tangent: the `IN` clause +#### A brief review: the `IN` clause The `IN` clause in SQL is used to filter records where a column matches any value in a specified list or subquery result. It is a shorthand for multiple `OR` conditions and is commonly used for readability and efficiency. @@ -153,9 +165,22 @@ Now back to using a subquery for filtering! ### Example: Filtering with a Subquery -For our own database, let's collect patient demographic data for all patients who had some kind of procedure performed after January 1st, 2019. We'll make use of the `person` and `procedure_occurrence` tables for this query. +For our own database, let's collect patient demographic data for all patients who had some kind of procedure performed after December 31st, 2018. We'll make use of the `person` and `procedure_occurrence` tables for this query. + +We can start by writing the computation for our subquery - collection patient IDs for individuals who had a procedure after December 31st, 2018. ```{sql, connection="con", output.var="recent_pts"} +SELECT + person_id +FROM + procedure_occurrence +WHERE + procedure_datetime >= DATE '2019-01-01'; +``` + +Now, we can insert this query into the `WHERE` clause of our larger query that collects patient demographic information! + +```{sql, connection="con", output.var="recent_pt_info"} SELECT person_id, birth_datetime, @@ -177,25 +202,22 @@ WHERE #### Check on learning -Fill in the blank in the following SQL query to select relevant patient data for any patient in the `condition_occurrence` table who had a **condition start date** on or after January 1st, 2019. +Write out a query to collection patient IDs for individuals who had a **condition start date** after December 31st, 2018. This query will become the subquery in our larger computation. ```{sql connection="con", output.var="recent_pts"} #| eval: false SELECT - person_id, - birth_datetime, - gender_source_value, - race_source_value, - ethnicity_source_value + person_id FROM - person + condition_occurrence WHERE - person_id IN (_________); + condition_start_date >= ______ ``` +Now, fill in the blank in the following SQL query with the subquery that you just developed to collect patient demographic data for any patient in the `condition_occurrence` table who had a condition start date on or after January 1st, 2019. + ```{sql connection="con", output.var="recent_pts"} #| eval: false -#| include: false SELECT person_id, birth_datetime, @@ -205,14 +227,7 @@ SELECT FROM person WHERE - person_id IN ( - SELECT - person_id - FROM - condition_occurrence - WHERE - condition_start_date >= DATE '2019-01-01' - ); + person_id IN (_________); ``` ### When to use subqueries @@ -318,7 +333,7 @@ While writing efficient SQL queries is important, database performance optimizat - **Do not manually create indexes**: Indexing can significantly improve query performance, but in most cases, it is **the responsibility of the Database Administrator (DBA)** to manage indexes appropriately. If you believe an index is needed, consult with your DBA. -- **When in doubt, talk to your database administrator**: They have the expertise to optimize database performance, manage indexing, and ensure efficient query execution. +- **When in doubt, talk to your database administrator**: Especially when your database is transactional, you should not be the one doing these modifications! DBAs have the expertise to optimize database performance, manage indexing, and ensure efficient query execution. Trust your DBA! ## Summary From c27d6ed3ef9b439963fb680f1af4488d825f76e5 Mon Sep 17 00:00:00 2001 From: vsriram24 Date: Fri, 28 Feb 2025 09:35:05 -0800 Subject: [PATCH 3/3] Simplify exercises, add challenge, add solutions --- week4-exercises.qmd | 54 ++++++++++---------- week4-solutions.qmd | 121 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 148 insertions(+), 27 deletions(-) create mode 100644 week4-solutions.qmd diff --git a/week4-exercises.qmd b/week4-exercises.qmd index d538edb..7abf081 100644 --- a/week4-exercises.qmd +++ b/week4-exercises.qmd @@ -16,7 +16,7 @@ con <- DBI::dbConnect(duckdb::duckdb(), ## Subquery in `SELECT` -Fill in the blank in the query below to find each patient's demographic data along with the **total number of procedures** they have had. +1. Fill in the blank in the subquery below to find each patient's demographic data along with the **total number of procedures** they have had. Note that this query makes use of the `person` table as well as the `procedure_occurrence` table. ```{sql connection="con"} SELECT @@ -28,13 +28,13 @@ SELECT FROM procedure_occurrence WHERE - person.person_id = procedure_occurrence.person_id + person.person_id = ___________.person_id ) AS number_of_procedures FROM person; ``` -2. Fill in the blank in the query below to dynamically calculate the **number of months** between the **procedure start date** and today for all procedures from the `procedure_occurrence` table +2. Fill in the blank in the query below to dynamically calculate the **number of months** between the **procedure date** and today for all procedures from the `procedure_occurrence` table ```{sql connection="con"} #| eval: false @@ -46,18 +46,16 @@ SELECT procedure_datetime, (SELECT DATE_DIFF( - 'month', procedure_datetime, DATE '2025-03-07' + ______, ______, DATE '2025-03-07' ) - ) AS procedure_time_to_tody + ) AS procedure_time_to_today FROM procedure_occurrence; ``` - - ## Subquery in `WHERE` -Count the number of cases for `procedure_occurrence` with the following criteria: +Collect patient demographic data for all patients who have an occurrence of a condition with id = "40481087": ```{sql connection="con"} SELECT @@ -75,32 +73,34 @@ WHERE FROM condition_occurrence WHERE - condition_concept_id == '40481087' + _____________ == '40481087' ); ``` +## Creating a view -## Creating a view (with `DATEDIFF`) - -Create a view for senior citizen occurrences, where we collect procedure occurrences for all patients aged >= 50 at the time of their procedure +4. Create a view for senior citizen demographics, where we collect demographics for patients born in or before 1960. ```{sql} #| connection: "con" -CREATE VIEW senior_procedures AS -SELECT * -FROM procedure_occurrence -WHERE procedure_occurrence.person_id IN ( - SELECT - person_id - FROM - person - WHERE - DATE_DIFF( - 'year', - person.birth_datetime, - procedure_occurrence.procedure_datetime - ) >= 50 -); +CREATE VIEW senior_demographics AS +SELECT + person_id, + birth_datetime, + gender_source_value, + race_source_value, + ethnicity_source_value +FROM person +WHERE + _______ >= '1960'; ``` +## Challenge: Creating a view (using `DATEDIFF` in a subquery) + +5. Create a view for senior citizen procedures, where we collect procedure occurrences for all patients aged \>= 50 at the time of their procedure + +```{sql} +#| connection: "con" + +``` diff --git a/week4-solutions.qmd b/week4-solutions.qmd new file mode 100644 index 0000000..bf270a7 --- /dev/null +++ b/week4-solutions.qmd @@ -0,0 +1,121 @@ +--- +title: "Week 4 Exercises - Solutions" +--- + +We'll first connect to the database: + +```{r} +#| context: setup +library(duckdb) +library(DBI) +library(DiagrammeR) +con <- DBI::dbConnect(duckdb::duckdb(), + "data/GiBleed_5.3_1.1.duckdb") + +``` + +## Subquery in `SELECT` + +1. Fill in the blank in the subquery below to find each patient's demographic data along with the **total number of procedures** they have had. Note that this query makes use of the `person` table as well as the `procedure_occurrence` table. + +```{sql connection="con"} +SELECT + person_id, + gender_source_value, + race_source_value, + (SELECT + COUNT(*) + FROM + procedure_occurrence + WHERE + person.person_id = procedure_occurrence.person_id + ) AS number_of_procedures +FROM + person; +``` + +2. Fill in the blank in the query below to dynamically calculate the **number of months** between the **procedure date** and today for all procedures from the `procedure_occurrence` table + +```{sql connection="con"} +#| eval: false +SELECT + person_id, + visit_occurrence_id, + procedure_occurrence_id, + procedure_concept_id, + procedure_datetime, + (SELECT + DATE_DIFF( + 'month', procedure_datetime, DATE '2025-03-07' + ) + ) AS procedure_time_to_today +FROM + procedure_occurrence; +``` + +## Subquery in `WHERE` + +3. Collect patient demographic data for all patients who have an occurrence of a condition with id = "40481087": + +```{sql connection="con"} +SELECT + person_id, + birth_datetime, + gender_source_value, + race_source_value, + ethnicity_source_value +FROM + person +WHERE + person_id IN ( + SELECT + person_id + FROM + condition_occurrence + WHERE + condition_concept_id == '40481087' + ); + +``` + +## Creating a view + +4. Create a view for senior citizen demographics, where we collect demographics for patients born in or before 1960. + +```{sql} +#| connection: "con" +CREATE VIEW senior_demographics AS +SELECT + person_id, + birth_datetime, + gender_source_value, + race_source_value, + ethnicity_source_value +FROM person +WHERE + year_of_birth >= '1960'; +``` + + +## Challenge: Creating a view (using `DATEDIFF` in a subquery) + +5. Create a view for senior citizen procedures, where we collect procedure occurrences for all patients aged \>= 50 at the time of their procedure + +```{sql} +#| connection: "con" +CREATE VIEW senior_procedures AS +SELECT * +FROM procedure_occurrence +WHERE procedure_occurrence.person_id IN ( + SELECT + person_id + FROM + person + WHERE + DATE_DIFF( + 'year', + person.birth_datetime, + procedure_occurrence.procedure_datetime + ) >= 50 +); +```