Skip to content

Commit e160e20

Browse files
committed
add week 2 first draft
1 parent 623991d commit e160e20

File tree

2 files changed

+19
-230
lines changed

2 files changed

+19
-230
lines changed

README.md

Lines changed: 15 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,14 @@
44

55
## 🎓 What is this course about?
66

7-
Applied data science is already shaping how the world tackles pressing social challenges — from fighting discrimination, to allocating resources for refugees, to managing water scarcity in Africa. This course invites students from a wide range of backgrounds — social sciences, policy, business, environmental studies, or anyone simply curious about data — to learn how to use data science tools in ways that matter.
7+
Applied data science is already shaping how the world tackles pressing social challenges — from fighting discrimination, to allocating resources for refugees, to managing water scarcity. This course invites students from a wide range of backgrounds — social sciences, policy, business, environmental studies, or anyone simply curious about data — to learn how to use data science tools in ways that matter.
88

9-
Over ten weeks, you'll get hands-on experience with Python in Google Colab (no installation required), organize your work on GitHub, create interactive visualizations with Plotly, and even try your hand at building a simple application in Streamlit. You'll also see how modern AI coding assistants can serve as partners in debugging, exploration, and idea generation.
9+
Over fifteen weeks, you'll get hands-on experience with Python in Google Colab (no installation required), organize your work on GitHub, create interactive visualizations with Plotly, and even try your hand at building a simple application in Streamlit. You'll also see how modern AI coding assistants can serve as partners in debugging, exploration, and idea generation.
1010

1111
Materials for each week can be found in the `./week*` folders. See README.md for further details and instructions.
1212

1313
Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue.
1414

15-
## 📌 Quick start
16-
17-
Get hands-on with Python in Google Colab (no installation required), organize your work on GitHub, create interactive visualizations with Plotly, and build simple applications in Streamlit. All tools used are free and web-based, with alternatives provided when needed — so you can fully participate from anywhere in the world.
18-
19-
This 10-week course combines lecture and tutorial time each session (~40 min lecture + ~40 min hands-on tutorial), concluding with an individual project where you choose a social issue that matters to you and use data to explore, explain, or propose solutions.
2015

2116
## 🎯 Learning Outcomes
2217

@@ -30,73 +25,21 @@ By the end of the course, you will be able to:
3025

3126
## � Course Curriculum
3227

33-
**Format:** 1 class per week (80 min)
34-
Each of the first 8 sessions: ~40 min lecture + ~40 min tutorial (hands-on in Colab)
35-
Week 9: Open office hours for project help
36-
Week 10: Capstone showcase
37-
38-
### [Week 1️⃣ Welcome & First Steps in Python for Data](./week1)
39-
40-
**Lecture:** Why data science for social impact? Intro to Python in Google Colab. Basic operations, variables, data types.
41-
**Tutorial:** Run your first Colab notebook, write a few lines of Python, practice with lists/dictionaries.
42-
**Mini-deliverable:** Colab notebook uploaded to GitHub repo with a short "hello world" exercise.
43-
44-
### [Week 2️⃣ Working with DataFrames (Pandas Basics)](./week2)
45-
46-
**Lecture:** Intro to structured data. Loading CSVs, exploring datasets. Rows, columns, indexing.
47-
**Tutorial:** Clean a small dataset (drop NAs, rename columns, simple transforms).
48-
**Mini-deliverable:** A notebook summarizing key stats of a real dataset (mean, counts, etc.).
49-
50-
### [Week 3️⃣ Data Visualization with Plotly](./week3)
51-
52-
**Lecture:** Why visualization matters for storytelling. Plotly basics (scatter, bar, line, box).
53-
**Tutorial:** Make 2–3 interactive charts to answer simple questions about a dataset.
54-
**Mini-deliverable:** At least one polished Plotly chart with titles and labels in your repo.
55-
56-
### [Week 4️⃣ Organizing Projects with GitHub + First Encounter with AI Helpers](./week4)
57-
58-
**Lecture:** GitHub basics: repos, commits, README files. Intro to coding assistants (ChatGPT, Copilot) as debugging and code-explanation partners.
59-
**Tutorial:** Create a repo for your project. Commit a cleaned dataset + notebook. Use an LLM to explain a code snippet in plain English.
60-
**Mini-deliverable:** A GitHub repo with first project files and README draft.
61-
62-
### [Week 5️⃣ Working with External Data & APIs](./week5)
63-
64-
**Lecture:** Why APIs matter (fresh data, live data sources). Intro to JSON and making API calls in Python.
65-
**Tutorial:** Connect to a simple public API (e.g., climate, UN, World Bank), load data into Pandas.
66-
**Mini-deliverable:** Notebook that fetches API data and merges it with existing dataset.
28+
**Format:** 1 class per week (180 min)
6729

68-
### [Week 6️⃣ Intro to Predictive Analytics (without heavy math)](./week6)
30+
Each of the first 13 sessions: ~90 min lecture + ~90 min tutorial. Homework is optional except for a mid-term assignments and a capstone project.
6931

70-
**Lecture:** From describing to predicting. Train/test split, decision trees, linear models. Focus on interpretation, not formulas.
71-
**Tutorial:** Build a simple predictive model (e.g., predict education outcome from income). Use sklearn.
72-
**Mini-deliverable:** Notebook with model results + short written interpretation ("what does this mean?").
32+
Week 14: Open office hours for project help
7333

74-
### [Week 7️⃣ Automating Workflows with Functions + Using AI to Scale Code](./week7)
34+
Week 15: Capstone showcase
7535

76-
**Lecture:** Writing reusable functions. Why automation matters. Prompting AI to generate and refactor code.
77-
**Tutorial:** Turn analysis into functions (e.g., def clean_data() or def predict_outcome()). Use an LLM to help restructure the code.
78-
**Mini-deliverable:** Script with at least one reusable function + AI-assisted refactor.
79-
80-
### [Week 8️⃣ From Notebooks to Apps (Streamlit Basics)](./week8)
81-
82-
**Lecture:** Why apps matter for social impact. Streamlit basics.
83-
**Tutorial:** Create a simple Streamlit app that uploads a CSV and shows a Plotly chart. Deploy locally.
84-
**Mini-deliverable:** Local Streamlit app with at least one visualization.
85-
86-
### [Week 9️⃣ Project Office Hours](./week9)
87-
88-
**Format:** Drop-in, Q&A, debugging help, design advice. Students work on their capstone projects.
89-
90-
### [Week 🔟 Final Presentations — Capstone Showcase](./week10)
91-
92-
**Deliverable:** Deployed Streamlit app (on Streamlit Cloud) + GitHub repo with README + 3-minute presentation.
93-
**Celebration:** Students present apps that inform, guide, and optimize around their chosen social issue.
9436

9537
## 🏗️ Capstone Project: Data for Social Impact
9638

9739
Throughout the course, you'll work toward a final individual project where you choose a social issue that matters to you and use data to explore, explain, or propose solutions.
9840

9941
**Your capstone will include:**
42+
10043
**Data collection** from real-world sources relevant to your chosen issue
10144
**Interactive visualizations** that tell a compelling story
10245
**Predictive analysis** with clear, accessible interpretations
@@ -110,22 +53,23 @@ Throughout the course, you'll work toward a final individual project where you c
11053
- Economic inequality and policy impact
11154
- Social justice and discrimination patterns
11255
- Global development and humanitarian aid
56+
- Freedom of speech and censorship
11357

11458
This hands-on approach ensures you gain practical experience while making a meaningful contribution to causes you care about.
11559

11660
## 💬 Join the community
117-
118-
Connect with experts, engage in discussions, ask questions, and share insights, experiences, and feedback with fellow learners. Stay in the loop — live session announcements will be posted in our community channels!
119-
120-
For updates, you can also subscribe to our newsletter.
61+
This course can be completed in self-paced mode but currently runs as 1-semester course through Smolny Beyond Borders initiative at Bard College.
12162

12263
## 👨‍🏫 Meet our team
12364

12465
This course is created by a team of data science practitioners and researchers dedicated to making data science education accessible and practical:
12566

12667
<!-- Add your team members here -->
127-
- **Course Lead:** [Your Name]
128-
- **Contributors:** [Team Members]
68+
- **Course Lead:** [Anastasiia Kulakova](https://www.linkedin.com/in/anastasiia-kulakova-30704a174/)
69+
70+
- **Educational Institution Partner:** [Russian Independent Media Archive]()
71+
72+
- **Non-profit Partner:**[ Russian Independent Media Archive]()
12973

13074
## 🛠️ Prerequisites & Tools
13175

@@ -136,6 +80,7 @@ This course is created by a team of data science practitioners and researchers d
13680

13781
### Tools Used (All Free & Web-Based)
13882
- **Google Colab**: Python programming environment (no installation needed)
83+
- **Databricks**: Data environment (no installation needed)
13984
- **GitHub**: Project organization and version control
14085
- **Plotly**: Interactive data visualization
14186
- **Streamlit**: Building and deploying simple web applications

week2/README.md

Lines changed: 4 additions & 160 deletions
Original file line numberDiff line numberDiff line change
@@ -15,26 +15,11 @@ By the end of this week, you will be able to:
1515
- Handle missing values and rename columns appropriately
1616
- Filter and sort data to answer specific questions
1717

18-
## 🎓 Session Structure (80 minutes)
18+
## 🎓 Session Resources
1919

20-
### Lecture (40 minutes): Introduction to Structured Data
21-
22-
**Topics Covered:**
23-
- What are DataFrames and why they matter for social impact analysis
24-
- Loading data from CSV files using `pd.read_csv()`
25-
- Exploring dataset structure: `.shape`, `.info()`, `.head()`, `.tail()`
26-
- Understanding different data types: numbers, text, dates
27-
- Basic indexing: selecting rows and columns
28-
29-
### Tutorial (40 minutes): Cleaning a Real Social Dataset
30-
31-
**Hands-on Activities:**
32-
- Load a dataset about global education or health indicators
33-
- Explore the data structure and identify issues
34-
- Clean the dataset: drop missing values, rename columns
35-
- Calculate basic statistics (mean, median, counts)
36-
- Create simple data transformations
20+
- Lecture: [Working with DataFrames I: Pandas Foundations]()
3721

22+
- Tutorial: [Working with DataFrames I: Pandas Foundations](../week2/notebooks/tutorial_pandas_basics.ipynb)
3823
## 🏗️ Mini-Deliverable
3924

4025
**Assignment:** Create a notebook that summarizes key statistics of a real dataset related to social impact.
@@ -47,116 +32,6 @@ By the end of this week, you will be able to:
4732
5. **Answer 3 specific questions** about the data using filtering/grouping
4833
6. **Document your findings** with clear explanations
4934

50-
### Example Analysis Structure:
51-
```python
52-
import pandas as pd
53-
54-
# Load dataset
55-
df = pd.read_csv('global_education_data.csv')
56-
57-
# Explore structure
58-
print(f"Dataset shape: {df.shape}")
59-
print(f"Columns: {df.columns.tolist()}")
60-
print(df.info())
61-
62-
# Clean data
63-
df = df.dropna(subset=['literacy_rate']) # Remove missing literacy rates
64-
df = df.rename(columns={'GDP_per_cap': 'gdp_per_capita'}) # Cleaner names
65-
66-
# Summary statistics
67-
print(f"Average literacy rate: {df['literacy_rate'].mean():.1f}%")
68-
print(f"Countries with data: {df['country'].nunique()}")
69-
70-
# Answer specific questions
71-
high_literacy = df[df['literacy_rate'] > 95]
72-
print(f"Countries with >95% literacy: {len(high_literacy)}")
73-
```
74-
75-
## 📁 Files Structure
76-
77-
```
78-
week2/
79-
├── README.md
80-
├── notebooks/
81-
│ ├── lecture_pandas_intro.ipynb
82-
│ ├── tutorial_data_cleaning.ipynb
83-
│ └── assignment_dataset_summary.ipynb
84-
├── data/
85-
│ ├── global_education_indicators.csv
86-
│ ├── world_health_data.csv
87-
│ └── sustainable_development_goals.csv
88-
├── examples/
89-
│ └── sample_analysis.ipynb
90-
└── resources/
91-
├── pandas_cheatsheet.md
92-
└── common_data_issues.md
93-
```
94-
95-
## 📖 Key Concepts Introduced
96-
97-
### Pandas Fundamentals
98-
- **DataFrame**: 2D labeled data structure (like a spreadsheet)
99-
- **Series**: 1D labeled array (single column)
100-
- **Index**: Row labels for data identification
101-
- **Columns**: Variable names in your dataset
102-
103-
### Essential Pandas Methods
104-
```python
105-
# Loading data
106-
df = pd.read_csv('filename.csv')
107-
108-
# Exploring structure
109-
df.shape # (rows, columns)
110-
df.info() # Data types and missing values
111-
df.head() # First 5 rows
112-
df.describe() # Summary statistics
113-
114-
# Data cleaning
115-
df.dropna() # Remove missing values
116-
df.fillna(0) # Fill missing values
117-
df.rename(columns={'old': 'new'}) # Rename columns
118-
119-
# Basic analysis
120-
df['column'].mean() # Average
121-
df['column'].nunique() # Count unique values
122-
df.groupby('category').mean() # Group analysis
123-
```
124-
125-
### Data Cleaning Best Practices
126-
- Always explore your data first
127-
- Document what cleaning steps you take
128-
- Keep track of how many rows you remove
129-
- Use meaningful column names
130-
- Check for duplicates and outliers
131-
132-
## 🎥 Video Resources
133-
134-
1. **Why DataFrames Matter for Social Impact** (10 min)
135-
2. **Loading and Exploring Data with Pandas** (15 min)
136-
3. **Data Cleaning Essentials** (20 min)
137-
4. **Calculating Summary Statistics** (10 min)
138-
139-
## � Sample Datasets
140-
141-
This week you'll work with real datasets related to:
142-
143-
### Option 1: Global Education Data
144-
- Literacy rates by country and year
145-
- School enrollment statistics
146-
- Educational spending per capita
147-
- Gender gaps in education
148-
149-
### Option 2: World Health Indicators
150-
- Life expectancy by country
151-
- Infant mortality rates
152-
- Healthcare spending
153-
- Disease prevalence data
154-
155-
### Option 3: Sustainable Development Goals
156-
- Progress towards UN SDG targets
157-
- Poverty reduction indicators
158-
- Environmental sustainability metrics
159-
- Gender equality measures
16035

16136
## 🔗 Additional Resources
16237

@@ -171,38 +46,7 @@ This week you'll work with real datasets related to:
17146
- [Our World in Data](https://ourworldindata.org/)
17247
- [Gapminder](https://www.gapminder.org/data/)
17348

174-
## ❓ Common Questions
175-
176-
**Q: The dataset has thousands of rows. Do I need to analyze all of them?**
177-
A: No! Start with a subset using `.head(100)` or filter to specific countries/years you're interested in.
178-
179-
**Q: What if there are many missing values?**
180-
A: First understand why data is missing. Sometimes missing data tells a story (e.g., countries not reporting certain indicators).
181-
182-
**Q: How do I know if my data cleaning is correct?**
183-
A: Always check before and after: print the shape, look at samples, and verify your cleaning makes sense.
184-
185-
**Q: Can I use my own dataset?**
186-
A: Yes! As long as it's related to social impact and has at least 100 rows with multiple columns.
187-
188-
## 🆘 Getting Help
189-
190-
- **Discussion Forum**: Ask about specific Pandas errors
191-
- **Office Hours**: Tuesdays 6-7 PM
192-
- **AI Assistants**: Great for explaining Pandas syntax
193-
- **Peer Support**: Share datasets and cleaning strategies
194-
195-
## 📈 Assessment Criteria
196-
197-
Your dataset summary notebook will be evaluated on:
198-
- **Data Loading**: Successfully imports and explores dataset
199-
- **Cleaning Process**: Appropriate handling of missing data and column names
200-
- **Summary Statistics**: Meaningful calculations with clear interpretation
201-
- **Analysis Quality**: Answers to questions show understanding of the data
202-
- **Documentation**: Clear explanations of what you found
203-
20449
---
205-
206-
**Next Week**: [Week 3: Data Visualization with Plotly](../week3/README.md)
50+
**Next Week**: [Week 3: Working with DataFrames II: Cleaning & Transforming Data](../week3/README.md)
20751

20852
**Previous Week**: [Week 1: Welcome & First Steps in Python for Data](../week1/README.md)

0 commit comments

Comments
 (0)