add week 2 first draft

nastyakul · nastyakul · commit e160e2014300 · 2026-01-29T19:09:10.000+01:00
diff --git a/README.md b/README.md
@@ -4,19 +4,14 @@
 
 ## 🎓 What is this course about?
 
-Applied data science is already shaping how the world tackles pressing social challenges — from fighting discrimination, to allocating resources for refugees, to managing water scarcity in Africa. This course invites students from a wide range of backgrounds — social sciences, policy, business, environmental studies, or anyone simply curious about data — to learn how to use data science tools in ways that matter.
+Applied data science is already shaping how the world tackles pressing social challenges — from fighting discrimination, to allocating resources for refugees, to managing water scarcity. This course invites students from a wide range of backgrounds — social sciences, policy, business, environmental studies, or anyone simply curious about data — to learn how to use data science tools in ways that matter.
 
-Over ten weeks, you'll get hands-on experience with Python in Google Colab (no installation required), organize your work on GitHub, create interactive visualizations with Plotly, and even try your hand at building a simple application in Streamlit. You'll also see how modern AI coding assistants can serve as partners in debugging, exploration, and idea generation.
+Over fifteen weeks, you'll get hands-on experience with Python in Google Colab (no installation required), organize your work on GitHub, create interactive visualizations with Plotly, and even try your hand at building a simple application in Streamlit. You'll also see how modern AI coding assistants can serve as partners in debugging, exploration, and idea generation.
 
 Materials for each week can be found in the `./week*` folders. See README.md for further details and instructions.
 
 Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue.
 
-## 📌 Quick start
-
-Get hands-on with Python in Google Colab (no installation required), organize your work on GitHub, create interactive visualizations with Plotly, and build simple applications in Streamlit. All tools used are free and web-based, with alternatives provided when needed — so you can fully participate from anywhere in the world.
-
-This 10-week course combines lecture and tutorial time each session (~40 min lecture + ~40 min hands-on tutorial), concluding with an individual project where you choose a social issue that matters to you and use data to explore, explain, or propose solutions.
 
 ## 🎯 Learning Outcomes
 
@@ -30,73 +25,21 @@ By the end of the course, you will be able to:
 
 ## � Course Curriculum
 
-**Format:** 1 class per week (80 min)  
-Each of the first 8 sessions: ~40 min lecture + ~40 min tutorial (hands-on in Colab)  
-Week 9: Open office hours for project help  
-Week 10: Capstone showcase
-
-### [Week 1️⃣ Welcome & First Steps in Python for Data](./week1)
-
-**Lecture:** Why data science for social impact? Intro to Python in Google Colab. Basic operations, variables, data types.  
-**Tutorial:** Run your first Colab notebook, write a few lines of Python, practice with lists/dictionaries.  
-**Mini-deliverable:** Colab notebook uploaded to GitHub repo with a short "hello world" exercise.
-
-### [Week 2️⃣ Working with DataFrames (Pandas Basics)](./week2)
-
-**Lecture:** Intro to structured data. Loading CSVs, exploring datasets. Rows, columns, indexing.  
-**Tutorial:** Clean a small dataset (drop NAs, rename columns, simple transforms).  
-**Mini-deliverable:** A notebook summarizing key stats of a real dataset (mean, counts, etc.).
-
-### [Week 3️⃣ Data Visualization with Plotly](./week3)
-
-**Lecture:** Why visualization matters for storytelling. Plotly basics (scatter, bar, line, box).  
-**Tutorial:** Make 2–3 interactive charts to answer simple questions about a dataset.  
-**Mini-deliverable:** At least one polished Plotly chart with titles and labels in your repo.
-
-### [Week 4️⃣ Organizing Projects with GitHub + First Encounter with AI Helpers](./week4)
-
-**Lecture:** GitHub basics: repos, commits, README files. Intro to coding assistants (ChatGPT, Copilot) as debugging and code-explanation partners.  
-**Tutorial:** Create a repo for your project. Commit a cleaned dataset + notebook. Use an LLM to explain a code snippet in plain English.  
-**Mini-deliverable:** A GitHub repo with first project files and README draft.
-
-### [Week 5️⃣ Working with External Data & APIs](./week5)
-
-**Lecture:** Why APIs matter (fresh data, live data sources). Intro to JSON and making API calls in Python.  
-**Tutorial:** Connect to a simple public API (e.g., climate, UN, World Bank), load data into Pandas.  
-**Mini-deliverable:** Notebook that fetches API data and merges it with existing dataset.
+**Format:** 1 class per week (180 min) 
 
-### [Week 6️⃣ Intro to Predictive Analytics (without heavy math)](./week6)
+Each of the first 13 sessions: ~90 min lecture + ~90 min tutorial. Homework is optional except for a mid-term assignments and a capstone project.
 
-**Lecture:** From describing to predicting. Train/test split, decision trees, linear models. Focus on interpretation, not formulas.  
-**Tutorial:** Build a simple predictive model (e.g., predict education outcome from income). Use sklearn.  
-**Mini-deliverable:** Notebook with model results + short written interpretation ("what does this mean?").
+Week 14: Open office hours for project help 
 
-### [Week 7️⃣ Automating Workflows with Functions + Using AI to Scale Code](./week7)
+Week 15: Capstone showcase
 
-**Lecture:** Writing reusable functions. Why automation matters. Prompting AI to generate and refactor code.  
-**Tutorial:** Turn analysis into functions (e.g., def clean_data() or def predict_outcome()). Use an LLM to help restructure the code.  
-**Mini-deliverable:** Script with at least one reusable function + AI-assisted refactor.
-
-### [Week 8️⃣ From Notebooks to Apps (Streamlit Basics)](./week8)
-
-**Lecture:** Why apps matter for social impact. Streamlit basics.  
-**Tutorial:** Create a simple Streamlit app that uploads a CSV and shows a Plotly chart. Deploy locally.  
-**Mini-deliverable:** Local Streamlit app with at least one visualization.
-
-### [Week 9️⃣ Project Office Hours](./week9)
-
-**Format:** Drop-in, Q&A, debugging help, design advice. Students work on their capstone projects.
-
-### [Week 🔟 Final Presentations — Capstone Showcase](./week10)
-
-**Deliverable:** Deployed Streamlit app (on Streamlit Cloud) + GitHub repo with README + 3-minute presentation.  
-**Celebration:** Students present apps that inform, guide, and optimize around their chosen social issue.
 
 ## 🏗️ Capstone Project: Data for Social Impact
 
 Throughout the course, you'll work toward a final individual project where you choose a social issue that matters to you and use data to explore, explain, or propose solutions.
 
 **Your capstone will include:**
+
 • **Data collection** from real-world sources relevant to your chosen issue  
 • **Interactive visualizations** that tell a compelling story  
 • **Predictive analysis** with clear, accessible interpretations  
@@ -110,22 +53,23 @@ Throughout the course, you'll work toward a final individual project where you c
 - Economic inequality and policy impact
 - Social justice and discrimination patterns
 - Global development and humanitarian aid
+- Freedom of speech and censorship
 
 This hands-on approach ensures you gain practical experience while making a meaningful contribution to causes you care about.
 
 ## 💬 Join the community
-
-Connect with experts, engage in discussions, ask questions, and share insights, experiences, and feedback with fellow learners. Stay in the loop — live session announcements will be posted in our community channels!
-
-For updates, you can also subscribe to our newsletter.
+This course can be completed in self-paced mode but currently runs as 1-semester course through Smolny Beyond Borders initiative at Bard College. 
 
 ## 👨‍🏫 Meet our team
 
 This course is created by a team of data science practitioners and researchers dedicated to making data science education accessible and practical:
 
 <!-- Add your team members here -->
-- **Course Lead:** [Your Name]
-- **Contributors:** [Team Members]
+- **Course Lead:** [Anastasiia Kulakova](https://www.linkedin.com/in/anastasiia-kulakova-30704a174/)
+
+- **Educational Institution Partner:** [Russian Independent Media Archive]()
+
+- **Non-profit Partner:**[ Russian Independent Media Archive]()
 
 ## 🛠️ Prerequisites & Tools
 
@@ -136,6 +80,7 @@ This course is created by a team of data science practitioners and researchers d
 
 ### Tools Used (All Free & Web-Based)
 - **Google Colab**: Python programming environment (no installation needed)
+- **Databricks**: Data environment (no installation needed)
 - **GitHub**: Project organization and version control
 - **Plotly**: Interactive data visualization
 - **Streamlit**: Building and deploying simple web applications
diff --git a/week2/README.md b/week2/README.md
@@ -15,26 +15,11 @@ By the end of this week, you will be able to:
 - Handle missing values and rename columns appropriately
 - Filter and sort data to answer specific questions
 
-## 🎓 Session Structure (80 minutes)
+## 🎓 Session Resources 
 
-### Lecture (40 minutes): Introduction to Structured Data
-
-**Topics Covered:**
-- What are DataFrames and why they matter for social impact analysis
-- Loading data from CSV files using `pd.read_csv()`
-- Exploring dataset structure: `.shape`, `.info()`, `.head()`, `.tail()`
-- Understanding different data types: numbers, text, dates
-- Basic indexing: selecting rows and columns
-
-### Tutorial (40 minutes): Cleaning a Real Social Dataset
-
-**Hands-on Activities:**
-- Load a dataset about global education or health indicators
-- Explore the data structure and identify issues
-- Clean the dataset: drop missing values, rename columns
-- Calculate basic statistics (mean, median, counts)
-- Create simple data transformations
+- Lecture: [Working with DataFrames I: Pandas Foundations]()
 
+- Tutorial: [Working with DataFrames I: Pandas Foundations](../week2/notebooks/tutorial_pandas_basics.ipynb)
 ## 🏗️ Mini-Deliverable
 
 **Assignment:** Create a notebook that summarizes key statistics of a real dataset related to social impact.
@@ -47,116 +32,6 @@ By the end of this week, you will be able to:
 5. **Answer 3 specific questions** about the data using filtering/grouping
 6. **Document your findings** with clear explanations
 
-### Example Analysis Structure:
-```python
-import pandas as pd
-
-# Load dataset
-df = pd.read_csv('global_education_data.csv')
-
-# Explore structure
-print(f"Dataset shape: {df.shape}")
-print(f"Columns: {df.columns.tolist()}")
-print(df.info())
-
-# Clean data
-df = df.dropna(subset=['literacy_rate'])  # Remove missing literacy rates
-df = df.rename(columns={'GDP_per_cap': 'gdp_per_capita'})  # Cleaner names
-
-# Summary statistics
-print(f"Average literacy rate: {df['literacy_rate'].mean():.1f}%")
-print(f"Countries with data: {df['country'].nunique()}")
-
-# Answer specific questions
-high_literacy = df[df['literacy_rate'] > 95]
-print(f"Countries with >95% literacy: {len(high_literacy)}")
-```
-
-## 📁 Files Structure
-
-```
-week2/
-├── README.md
-├── notebooks/
-│   ├── lecture_pandas_intro.ipynb
-│   ├── tutorial_data_cleaning.ipynb
-│   └── assignment_dataset_summary.ipynb
-├── data/
-│   ├── global_education_indicators.csv
-│   ├── world_health_data.csv
-│   └── sustainable_development_goals.csv
-├── examples/
-│   └── sample_analysis.ipynb
-└── resources/
-    ├── pandas_cheatsheet.md
-    └── common_data_issues.md
-```
-
-## 📖 Key Concepts Introduced
-
-### Pandas Fundamentals
-- **DataFrame**: 2D labeled data structure (like a spreadsheet)
-- **Series**: 1D labeled array (single column)
-- **Index**: Row labels for data identification
-- **Columns**: Variable names in your dataset
-
-### Essential Pandas Methods
-```python
-# Loading data
-df = pd.read_csv('filename.csv')
-
-# Exploring structure
-df.shape          # (rows, columns)
-df.info()         # Data types and missing values
-df.head()         # First 5 rows
-df.describe()     # Summary statistics
-
-# Data cleaning
-df.dropna()       # Remove missing values
-df.fillna(0)      # Fill missing values
-df.rename(columns={'old': 'new'})  # Rename columns
-
-# Basic analysis
-df['column'].mean()     # Average
-df['column'].nunique()  # Count unique values
-df.groupby('category').mean()  # Group analysis
-```
-
-### Data Cleaning Best Practices
-- Always explore your data first
-- Document what cleaning steps you take
-- Keep track of how many rows you remove
-- Use meaningful column names
-- Check for duplicates and outliers
-
-## 🎥 Video Resources
-
-1. **Why DataFrames Matter for Social Impact** (10 min)
-2. **Loading and Exploring Data with Pandas** (15 min)
-3. **Data Cleaning Essentials** (20 min)
-4. **Calculating Summary Statistics** (10 min)
-
-## � Sample Datasets
-
-This week you'll work with real datasets related to:
-
-### Option 1: Global Education Data
-- Literacy rates by country and year
-- School enrollment statistics
-- Educational spending per capita
-- Gender gaps in education
-
-### Option 2: World Health Indicators
-- Life expectancy by country
-- Infant mortality rates
-- Healthcare spending
-- Disease prevalence data
-
-### Option 3: Sustainable Development Goals
-- Progress towards UN SDG targets
-- Poverty reduction indicators
-- Environmental sustainability metrics
-- Gender equality measures
 
 ## 🔗 Additional Resources
 
@@ -171,38 +46,7 @@ This week you'll work with real datasets related to:
 - [Our World in Data](https://ourworldindata.org/)
 - [Gapminder](https://www.gapminder.org/data/)
 
-## ❓ Common Questions
-
-**Q: The dataset has thousands of rows. Do I need to analyze all of them?**
-A: No! Start with a subset using `.head(100)` or filter to specific countries/years you're interested in.
-
-**Q: What if there are many missing values?**
-A: First understand why data is missing. Sometimes missing data tells a story (e.g., countries not reporting certain indicators).
-
-**Q: How do I know if my data cleaning is correct?**
-A: Always check before and after: print the shape, look at samples, and verify your cleaning makes sense.
-
-**Q: Can I use my own dataset?**
-A: Yes! As long as it's related to social impact and has at least 100 rows with multiple columns.
-
-## 🆘 Getting Help
-
-- **Discussion Forum**: Ask about specific Pandas errors
-- **Office Hours**: Tuesdays 6-7 PM
-- **AI Assistants**: Great for explaining Pandas syntax
-- **Peer Support**: Share datasets and cleaning strategies
-
-## 📈 Assessment Criteria
-
-Your dataset summary notebook will be evaluated on:
-- **Data Loading**: Successfully imports and explores dataset
-- **Cleaning Process**: Appropriate handling of missing data and column names
-- **Summary Statistics**: Meaningful calculations with clear interpretation
-- **Analysis Quality**: Answers to questions show understanding of the data
-- **Documentation**: Clear explanations of what you found
-
 ---
-
-**Next Week**: [Week 3: Data Visualization with Plotly](../week3/README.md)
+**Next Week**: [Week 3: Working with DataFrames II: Cleaning & Transforming Data](../week3/README.md)
 
 **Previous Week**: [Week 1: Welcome & First Steps in Python for Data](../week1/README.md)