Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions concepts.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ SQL is short for **S**tructured **Q**uery **L**anguage. It is a standardized lan
SQL lets us do various operations on data. It contains various *clauses* which let us manipulate data:

| Priority | Clause | Purpose |
|------------|------------|------------------------------------------------|
|---------------|---------------|-------------------------------------------|
| 1 | `FROM` | Choose tables to query and specify how to `JOIN` them together |
| 2 | `WHERE` | Filter tables based on criteria |
| 3 | `GROUP BY` | Aggregates the Data |
Expand All @@ -109,7 +109,7 @@ We do not use all of these clauses when we write a SQL Query. We only use the on
Oftentimes, we really only want a summary out of the database. We would probably use the following clauses:

| Priority | Clause | Purpose |
|------------|------------|------------------------------------------------|
|---------------|---------------|-------------------------------------------|
| 1 | `FROM` | Choose tables to query and specify how to `JOIN` them together |
| 2 | `WHERE` | Filter tables based on criteria |
| 3 | `GROUP BY` | Aggregates the Data |
Expand Down
69 changes: 69 additions & 0 deletions dbplyr-example.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: "dbplyr demo"
format: html
editor: visual
---

## dbplyr!

*"Okay, cool, thanks for Intro to SQL. Now can I go back to coding in R?" -* 🧑‍🌾

```{r}
library(tidyverse)
library(dbplyr)
```

```{r}
library(duckdb)
library(DBI)

con <- DBI::dbConnect(
duckdb::duckdb(),
"data/GiBleed_5.3_1.1.duckdb"
)
```

Specify the dataframe of interest:

```{r}
person_db <- tbl(con, "person")
```

Now, you can query it. By default you get the first 1000 entries.

```{r}
person_db |>
select(person_id, birth_datetime, gender_source_value) |>
filter(gender_source_value == "F")
```

You can save your query as a variable:

```{r}
cool_query = person_db |>
select(person_id, birth_datetime, gender_source_value) |>
filter(gender_source_value == "F")
```

You can see what it is translating into SQL:

```{r}
cool_query |> show_query()
```

Finally, when you are ready to fully query it:

```{r}
cool_query_result = cool_query |> collect()
```

Do R things:

```{r}
cool_query_result$year <- as.numeric(sub("-.*", "", cool_query_result$birth_datetime))
ggplot(cool_query_result) + aes(x = year) + geom_histogram() + theme_bw()
```

Full Guide here: <https://dbplyr.tidyverse.org/articles/dbplyr.html>

- Yes, you can even do joins: <https://dbplyr.tidyverse.org/reference/join.tbl_sql.html>
38 changes: 18 additions & 20 deletions week4-exercises.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ con <- DBI::dbConnect(duckdb::duckdb(),

## Subquery in `SELECT`

1. Fill in the blank in the subquery below to find each patient's demographic data along with the **total number of procedures** they have had. Note that this query makes use of the `person` table as well as the `procedure_occurrence` table.
1. Fill in the blank in the subquery below to find each patient's demographic data along with the **total number of procedures** they have had. Note that this query makes use of the `person` table as well as the `procedure_occurrence` table.

```{sql connection="con"}
SELECT
Expand Down Expand Up @@ -46,7 +46,7 @@ SELECT
procedure_datetime,
(SELECT
DATE_DIFF(
______, ______, DATE '2025-03-07'
'month', ______, DATE '2025-11-7'
)
) AS procedure_time_to_today
FROM
Expand All @@ -55,7 +55,7 @@ FROM

## Subquery in `WHERE`

Collect patient demographic data for all patients who have an occurrence of a condition with id = "40481087":
Collect patient demographic data for all patients who have an occurrence of a `condition_occurrence_id` = "40481087":

```{sql connection="con"}
SELECT
Expand All @@ -78,29 +78,27 @@ WHERE

```

## Creating a view
## Challenge: Creating a view (using `DATEDIFF` in a subquery)

4. Create a view for senior citizen demographics, where we collect demographics for patients born in or before 1960.
5. Create a view for senior citizen procedures, where we collect procedure occurrences for all patients aged \>= 50 at the time of their procedure.

```{sql}
#| connection: "con"
CREATE VIEW senior_demographics AS
SELECT
person_id,
birth_datetime,
gender_source_value,
race_source_value,
ethnicity_source_value
FROM person
WHERE
_______ >= '1960';
```
Break it down: Create a query for patients aged \>= 50. You will need to use the `person` table and use `DATE_DIFF` function on the `birth_datetime` column.

## Challenge: Creating a view (using `DATEDIFF` in a subquery)
```{sql connection="con"}
SELECT ---
FROM ---
WHERE DATE_DIFF('year', ----, DATE '2024-03-07') >= ---
```

5. Create a view for senior citizen procedures, where we collect procedure occurrences for all patients aged \>= 50 at the time of their procedure
Then, write the outer query of the view via the `person` table filtering where the `person_id` corresponds to your query above:

```{sql}
#| connection: "con"
CREATE VIEW senior_citizen_procedures AS


```

```{sql connection="con"}
SELECT * FROM senior_citizen_procedures
```
Loading