This repository contains common SQL features used in data analysis tasks. The queries are written in PostgreSQL.
We will name the schema as Sales in the following examples.
CREATE SCHEMA IF NOT EXISTS Sales;A schema is a way to organize database objects (tables, views, functions, etc.) into separate namespaces within a database.
There will be two tables called customers and orders in the Sales schema.
customers table:
CREATE TABLE IF NOT EXISTS sales.customers (
customer_id SERIAL PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100)
);orders table:
CREATE TABLE IF NOT EXISTS sales.orders (
order_id SERIAL PRIMARY KEY,
customer_id INT REFERENCES sales.customers(customer_id),
order_date DATE,
total_amount NUMERIC(10, 2)
);Inserting data into customers table:
INSERT INTO
Sales.customers (first_name, last_name, email)
VALUES
('John', 'Smith', 'john.smith@gmail.com'),
('Jane', 'Doe', 'jane.doe@gmail.com'),
...Inserting data into orders table:
INSERT INTO
Sales.orders (customer_id, order_date, total_amount)
VALUES
(1, '2020-01-01', 100.00),
(1, '2020-01-02', 200.00),
...1. Query Foundations
- The SELECT & FROM commands
- Selection using WHERE / IN / AND / BETWEEN
- ORDER BY
- TOP / LIMIT
- Aggregation with GROUP BY + SUM / COUNT
- HAVING
2. Merging & Joining
- Merging data using LEFT & INNER
- Stacking data using UNION and UNION ALL
- Executing multiple queries using TEMP TABLES and CTE
3. Manipulation
- WINDOW functions + OVER / PARTITION BY
- Adding LEAD / LAG
- Rounding numerical data
- Selecting random samples of data
- Manipulating dates
- Manipulating text
This list is taken from Andre Jones LinkedIn profile.
The following diagram shows the different SQL JOINs:

requirements.yml file contains the necessary packages to run the Jupyter Notebook. To install the packages, run the following command:
conda env create -f environment.yml