Skip to content

Assignments and notes from the IBM Data Science Professional Certificate. Extracting insights from large datasets to support strategic decision-making.

License

Notifications You must be signed in to change notification settings

JoshuaThadi/Data-Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Google Colab Icon

Data Science is an interdisciplinary field that uses statistical techniques, programming, data analysis, and machine learning to extract insights and knowledge from structured and unstructured data. It lies at the intersection of mathematics, computer science, and domain expertise.

Data Science Roadmap YouTube Thumbnail

Important

Data Science Assignment

Welcome to the Data Science assignment repository! This assignment, developed as part of a Coursera course, covers key data science concepts and practical coding exercises in Jupyter Notebook. Below is a summary of what you will find in this repository.

Coursera IBM_badge

Objectives

- Understand and list popular languages used in Data Science
- Explore essential libraries and tools for data science workflows
- Apply basic arithmetic operations within a Jupyter Notebook
- Perform unit conversions (e.g., converting minutes to hours)
- Practice using Markdown and Code cells in Jupyter Notebook

Data science - details

β˜† Key Components of Data Science

1] Data Collection: Gathering data from various sources: databases, APIs, sensors, web scraping, etc.
2] Data Cleaning and Preprocessing: Handling missing data, removing duplicates, fixing errors, normalizing formats.
3] Exploratory Data Analysis (EDA: Using statistics and visualization to understand patterns, trends, and anomalies.
4] Feature Engineering: Creating meaningful variables from raw data to improve model performance.
5] Model Building: Applying machine learning algorithms (e.g., regression, classification, clustering.
6] Model Evaluation: Testing model accuracy using metrics like precision, recall, F1-score, RMSE, etc.
7] Deployment: Integrating the model into a real-world application using tools like Flask, Docker, or cloud services
8] Monitoring and Maintenance: Tracking model performance over time and retraining when necessary.

Exploratory Data Analysis Feature Engineering Data Collection Monitoring and Maintenance Data Cleaning Model Evaluation Deployment Model Building

βœͺ Core Python Libraries / Modules

Data Manipulation & Analysis – NumPy, Pandas, Dask
Data Visualization – Matplotlib, Seaborn, Plotly, Altair
Machine Learning – scikit-learn, XGBoost, LightGBM, CatBoost, Hugging Face Transformers, TensorFlow, PyTorch
Deep Learning – Keras, PyTorch Lightning, ONNX
Model Deployment – Flask, FastAPI, Streamlit, Gradio, Docker

Pandas NumPy Matplotlib Python R SQL Azure Tableau Power BI Seaborn Scikit-learn TensorFlow PyTorch Jupyter Notebooks Google Colab AWS

πŸ“š Core Subjects in Data Science

1] Statistics & Probability – Foundational math for inference and predictions
2] Linear Algebra – Vectors, matrices β€” core of ML models
3] Calculus – Gradient descent, optimization
4] Machine Learning – Algorithms to learn from data
5] Deep Learning – Neural networks and deep architectures
6] NLP (Natural Language Processing) – Working with text and language
7] Computer Vision – Image and video analysis
8] Big Data – Working with large-scale data
9] Data Engineering – Pipelines, ETL, data storage
10] Model Deployment – Turning models into APIs/apps
11] MLOps – Production lifecycle of ML models
12] Data Visualization – Communicating insights effectively
13] Cloud & DevOps – Using AWS, Azure, GCP for scalable data solutions

Data Visualization Statistics & Probability Linear Algebra Calculus Big Data Cloud & DevOps Deep Learning Computer Vision Data Engineering Machine Learning Model Deployment MLOps Natural Language Processing

πŸ“Œ Topics to Cover

1] Exploratory Data Analysis (EDA) – Missing data, outliers, visualization
2] Feature Engineering – Encoding, scaling, transformations
3] Model Evaluation – Accuracy, precision, recall, ROC, AUC
4] Hyperparameter Tuning – GridSearch, RandomSearch, Optuna
5] Dimensionality Reduction – PCA, t-SNE, UMAP
6] Time Series Analysis – ARIMA, LSTM, Prophet
7] Unsupervised Learning – Clustering (KMeans, DBSCAN), PCA
8] Supervised Learning – Regression, classification
9] Neural Networks – CNN, RNN, GAN, transformers
10] Recommendation Systems – Collaborative filtering, content-based
11] Data Cleaning & Wrangling – Imputation, normalization, data types

β˜… Why is Data Science Important?

Data Science enables organizations to:
1] Make data-driven decisions
2] Predict future trends
3] Automate processes using machine learning
4] Improve customer experiences and optimize operations

🌐 Datasets & Practice

1] Kaggle Datasets
2] UCI Machine Learning Repository
3] Google Dataset Search
4] Data.gov
Kaggle Datasets Google Dataset Search UCI Repository Data.gov

πŸ“– Learning Resources

1] Python for Data Science – freeCodeCamp
2] Coursera Data Science Specialization
3] Fast.ai Courses
4] Harvard CS109 – Data Science
Python for Data Science Coursera JHU Harvard CS109 Fast.ai

✫ Applications of Data Science

1] Drug Discovery & Personalized Medicine
Use Case: Analyzing genetic data and molecular structures to discover new drugs faster and more effectively.
How: Machine learning models predict how a drug will interact with human proteins, reducing the need for trial-and-error in labs.
Drug Discovery & Personalized Medicine

2] Satellite Image Analysis & Earth Observation
Use Case: Monitoring deforestation, urban expansion, and climate change from space.
How: Computer vision applied to satellite imagery to track environmental changes in near real-time.
Satellite Image Analysis & Earth Observation

3] Neuroinformatics & Brain-Computer Interfaces (BCIs)
Use Case: Interpreting brain signals to control external devices or assist people with disabilities.
How: ML models decode EEG/fMRI data to enable mind-controlled prosthetics or communication devices.

Neuroinformatics & Brain-Computer Interfaces (BCIs)

4] Legal Analytics & Predictive Judging
Use Case: Predicting the outcome of legal cases or analyzing judge rulings.
How: NLP and ML models analyze vast amounts of case law and court data to assist legal research and strategy.

Legal Analytics & Predictive Judging

5] Content Generation & Scriptwriting
Use Case: Assisting in writing movie scripts or generating realistic dialogue.
How: NLP and generative models trained on film scripts, books, or dialogues to suggest or generate creative writing.

Content Generation & Scriptwriting

6] Game Analytics & Dynamic Difficulty Adjustment Use Case: Making video games adapt to player skill in real time for better engagement.
How: Analyzing gameplay data to adjust difficulty, recommend challenges, or predict player churn.

Game Analytics & Dynamic Difficulty Adjustment

7] Smart City Optimization Use Case: Managing traffic, energy consumption, and emergency response in real time.
How: Integrating IoT sensor data with predictive analytics to optimize urban infrastructure.

Smart City Optimization

8] Synthetic Biology & Genomic Sequencing
Use Case: Designing synthetic organisms or editing genes more efficiently.
How: Data science models help map and understand genetic patterns to identify gene targets for editing (CRISPR, etc.)

Synthetic Biology & Genomic Sequencing

9] Adaptive Learning Systems in EdTech
Use Case: Personalizing learning paths for students.
How: Tracking student performance data and recommending content or pace adjustment using ML.
Adaptive Learning Systems in EdTech

10] Social Good & Policy Simulation
Use Case: Simulating the outcome of policy changes (e.g., taxation, healthcare).
How: Data models trained on socio-economic datasets to project real-world impact of policies.

Social Good & Policy Simulation

⚠️ This repository is uniquely designed by @JoshuaThadi.

About

Assignments and notes from the IBM Data Science Professional Certificate. Extracting insights from large datasets to support strategic decision-making.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published