Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 20 additions & 6 deletions 02_activities/assignments/DC_Cohort/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,21 @@ The store wants to keep customer addresses. Propose two architectures for the CU

**HINT:** search type 1 vs type 2 slowly changing dimensions.

```
Your answer...
```
Architecture Number 1: Overwrite (Slowly changing dimension type 1)
Customer_ID will be unique and Customer_Addresss will have one row per customer.
When a customer changes locations, we update the existing row with the new address.
The history of the old address will be lost.

Columns could include:
Customer_ID (PK/FK), adress_1, address_2, city, province_state, postal_code, updated_at (is a timestamp that stores when the row was last edited)

Architecture Number 2: Changes will be retained (Slowly changing dimension type 2)
If we use Customer_Address_History we can have multiple rows for each customer, each row corresponding to a unique customer address
When a customer changes locations, we add a new row and the old one is no longer used
History is retained, thus, we can report utilizing the current valid address at the time of the order

Columns could include:
Customer_ID (PK), Customer_Address_ID (FK), adress_1, address_2, city, province_state, postal_code, country, start_date, end_date (will be NA for current address), is_current (true or false, marks whether address is customer's current one)

***

Expand Down Expand Up @@ -189,7 +201,9 @@ Read: Boykis, V. (2019, October 16). _Neural nets are just people all the way do

Consider, for example, concepts of labour, bias, LLM proliferation, moderating content, intersection of technology and society, ect.

The work by Boykis, V. (2019, October 16) discusses various key ethical issues including invisible labour and exploitation, bias in data and illusion of automation.
At first instance, the word artificial intelligence gives the idea that it is solely machine focused. What is left uncredited in this general thought is that despite AI being marketed as automated, it is contingent on large amounts of human labour that is underpaid. For example, ImageNet, a large dataset was formed through the efforts of numerous low-paid workers on platforms such as Amazon Mechanical Turk. For an unreasonably low income, these workers underwent repetitive cognitive tasks with their contributions left unrecognized, despite their efforts building the foundational ground for AI systems. Thus, this raises the ethical question, is it fair to promote AI as automated when it relies on uncredited and underpaid human efforts?
Albeit inevitable, we are all aware that since humans label the data, their biases may undoubtedly get embedded into AI systems. This includes decisions about which categories are selected. Critically, it is the structure of datasets themselves which are dependent on subjectivity such as cultural and political choices. This issue raises the idea that bias is not unintentional, rather it is found within the datasets themselves, reinforcing discrimination, stereotypes and social inequities.
Next, there are a multitude of decisions that humans make in the formation of machine learning, including data creating, labelling, categorization, and validation. All of this contradicts the idea that AI is fully autonomous, thus, this can promote a public misunderstanding in terms of AI’s capabilities and even reinforce undue trust in systems that can succumb to errors and are largely human influenced.
Overall, this reading underscores that AI is a social system that relies on human efforts through their labour, their choices and their biases. Understanding this is important for everyone to look at AI beyond just as technical system.

```
Your thoughts...
```
Binary file added 02_activities/assignments/DC_Cohort/ERD1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added 02_activities/assignments/DC_Cohort/ERD2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
164 changes: 155 additions & 9 deletions 02_activities/assignments/DC_Cohort/assignment2.sql
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,11 @@ Edit the appropriate columns -- you're making two edits -- and the NULL rows wil
All the other rows will remain the same. */
--QUERY 1


SELECT
COALESCE(product_name, '') || ', ' ||
COALESCE(product_size, '') || ' (' ||
COALESCE(product_qty_type, 'unit') || ')'
FROM product;


--END QUERY
Expand All @@ -41,7 +45,15 @@ HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK().
Filter the visits to dates before April 29, 2022. */
--QUERY 2


SELECT
cp.*,
DENSE_RANK() OVER (
PARTITION BY cp.customer_id
ORDER BY cp.market_date
) AS visit_number
FROM customer_purchases cp
WHERE cp.market_date < '2022-04-29'
ORDER BY cp.customer_id, cp.market_date;


--END QUERY
Expand All @@ -53,6 +65,19 @@ only the customer’s most recent visit.
HINT: Do not use the previous visit dates filter. */
--QUERY 3

WITH ranked AS (
SELECT
cp.*,
DENSE_RANK() OVER (
PARTITION BY cp.customer_id
ORDER BY cp.market_date DESC
) AS recent_visit_number
FROM customer_purchases cp
)
SELECT *
FROM ranked
WHERE recent_visit_number = 1
ORDER BY customer_id, market_date DESC;



Expand All @@ -66,6 +91,15 @@ You can make this a running count by including an ORDER BY within the PARTITION
Filter the visits to dates before April 29, 2022. */
--QUERY 4

SELECT
cp.*,
COUNT(*) OVER (
PARTITION BY cp.customer_id, cp.product_id
ORDER BY cp.market_date DESC
) AS times_customer_bought_product
FROM customer_purchases cp
WHERE cp.market_date < '2022-04-29'
ORDER BY cp.customer_id, cp.product_id, cp.market_date;



Expand All @@ -85,7 +119,14 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for
Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */
--QUERY 5


SELECT
p.product_name,
CASE
WHEN INSTR(p.product_name, '-') > 0 THEN
TRIM(SUBSTR(p.product_name, INSTR(p.product_name, '-') + 1))
ELSE NULL
END AS description
FROM product p;


--END QUERY
Expand All @@ -95,8 +136,11 @@ Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR w
--QUERY 6




SELECT
p.*
FROM product p
WHERE p.product_size GLOB '*[0-9]*';

--END QUERY


Expand All @@ -111,7 +155,35 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
with a UNION binding them. */
--QUERY 7


WITH sales_by_day AS (
SELECT
market_date,
SUM(quantity * cost_to_customer_per_qty) AS total_sales
FROM customer_purchases
GROUP BY market_date
),
ranked AS (
SELECT
market_date,
total_sales,
DENSE_RANK() OVER (ORDER BY total_sales DESC) AS best_rank,
DENSE_RANK() OVER (ORDER BY total_sales ASC) AS worst_rank
FROM sales_by_day
)
SELECT
market_date,
total_sales,
'highest' AS day_type
FROM ranked
WHERE best_rank = 1
UNION
SELECT
market_date,
total_sales,
'lowest' AS day_type
FROM ranked
WHERE worst_rank = 1
ORDER BY day_type, market_date;


--END QUERY
Expand All @@ -132,7 +204,33 @@ How many customers are there (y).
Before your final group by you should have the product of those two queries (x*y). */
--QUERY 8


WITH vendor_products AS (
SELECT DISTINCT
vi.vendor_id,
v.vendor_name,
vi.product_id,
p.product_name,
vi.original_price AS unit_price
FROM vendor_inventory vi
JOIN vendor v ON v.vendor_id = vi.vendor_id
JOIN product p ON p.product_id = vi.product_id
),

customer_count AS (
SELECT COUNT(*) AS num_customers
FROM customer
)

SELECT
vp.vendor_name,
vp.product_name,
5 AS units_per_customer,
cc.num_customers,
vp.unit_price,
(5 * cc.num_customers * vp.unit_price) AS total_revenue
FROM vendor_products vp
CROSS JOIN customer_count cc
ORDER BY vp.vendor_name, vp.product_name;


--END QUERY
Expand All @@ -145,6 +243,16 @@ It should use all of the columns from the product table, as well as a new column
Name the timestamp column `snapshot_timestamp`. */
--QUERY 9

CREATE TABLE product_units AS
SELECT
p.product_id,
p.product_name,
p.product_size,
p.product_category_id,
p.product_qty_type,
CURRENT_TIMESTAMP AS snapshot_timestamp
FROM product p
WHERE p.product_qty_type = 'unit';



Expand All @@ -155,7 +263,24 @@ Name the timestamp column `snapshot_timestamp`. */
This can be any product you desire (e.g. add another record for Apple Pie). */
--QUERY 10


INSERT INTO product_units (
product_id,
product_name,
product_size,
product_category_id,
product_qty_type,
snapshot_timestamp
)

SELECT
p.product_id,
p.product_name,
p.product_size,
p.product_category_id,
p.product_qty_type,
CURRENT_TIMESTAMP
FROM product p
WHERE p.product_id = 7;


--END QUERY
Expand All @@ -167,6 +292,13 @@ This can be any product you desire (e.g. add another record for Apple Pie). */
HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/
--QUERY 11

DELETE FROM product_units
WHERE product_id = 7
AND snapshot_timestamp < (
SELECT MAX(snapshot_timestamp)
FROM product_units
WHERE product_id = 7
);



Expand All @@ -191,7 +323,21 @@ Finally, make sure you have a WHERE statement to update the right row,
When you have all of these components, you can run the update statement. */
--QUERY 12


ALTER TABLE product_units
ADD current_quantity INT;

UPDATE product_units
SET current_quantity = COALESCE (
(
SELECT CAST(vi.quantity AS INT)
FROM vendor_inventory vi
WHERE vi.product_id = product_units.product_id
ORDER BY vi.market_date DESC
LIMIT 1
),
0
);



--END QUERY
Expand Down