A curated collection of advanced Python projects spanning AI, data analysis, and more, designed for learning and practical application.
- Overview
- Feature Highlights
- Architecture & Design
- Getting Started
- Usage Examples
- Limitations, Known Issues & Roadmap
- Contributing
- License, Credits & Contact
- Appendix
Welcome to the GDSC-FSC Python Projects Collection! This repository serves as a dynamic showcase and learning resource, bringing together various Python applications developed by the Google Developer Student Club (GDSC) at Farmingdale State College.
The core purpose of this collection is to:
- Demonstrate practical Python applications in key domains like Artificial Intelligence and Data Analysis.
- Provide a hands-on learning experience for students and enthusiasts interested in Python development.
- Offer readily available code examples for common tasks and advanced concepts.
This project matters because it acts as a valuable educational tool, enabling users to explore, understand, and build upon real-world Python implementations. It solves the problem of needing diverse, well-documented project examples in a single, accessible location.
Target Audience:
- Students learning Python, AI, or Data Science.
- Developers seeking practical code examples or starting points for new projects.
- Educators looking for demonstrative applications for their courses.
- Anyone curious about Python's capabilities in advanced fields.
This collection currently includes projects organized into key domains, each offering unique functionalities.
- Facial Landmark Detection (
Advanced/AI/facial_landmark.py)- β Real-time Processing: Detects facial landmarks from a live webcam feed.
- π 68-Point Detection: Utilizes the
dliblibrary's pre-trained model to identify 68 key facial points. - π‘ OpenCV Integration: Leverages OpenCV for camera access, frame processing, and visualization.
- π Interactive Display: Shows detected landmarks dynamically on the video stream.
-
F1 Score Visualization (
Advanced/Data analysis/Gradient.ipynb)- π 3D Interactive Plot: Visualizes the F1 score across varying precision and recall values using Matplotlib's 3D capabilities.
- π’ NumPy-Powered Calculations: Efficiently calculates F1 scores for a grid of precision and recall values.
- β Clarity on Metrics: Helps in understanding the relationship between precision, recall, and the harmonic mean (F1 score).
- π‘ Custom Helper Function: Includes a
f1_scorefunction for easy reusability.
-
Sentiment Analysis of Amazon Reviews (
Advanced/Data analysis/SentimentAnalysis.ipynb)- π¬ Natural Language Processing (NLP): Employs NLTK for text preprocessing, including tokenization, stop word removal, and lemmatization.
- π€ VADER Sentiment Analysis: Utilizes NLTK's VADER (Valence Aware Dictionary and sEntiment Reasoner) for rule-based sentiment scoring.
- π Amazon Review Dataset: Demonstrates analysis on a real-world dataset of Amazon product reviews.
- π Performance Evaluation: Provides a classification report and confusion matrix using
scikit-learnto assess sentiment prediction accuracy. - π‘ Step-by-Step Workflow: Clearly outlines the process from data loading to evaluation.
The repository is structured as a modular collection of independent Python projects, categorized by their domain. Each project is self-contained within its respective directory, making it easy to navigate and utilize specific functionalities without affecting others.
This diagram illustrates the overall structure and the primary categories within the repository.
graph LR
A[GDSC-FSC Python Projects] --> B{Advanced Projects};
B --> C[AI];
B --> D[Data Analysis];
C --> C1[Facial Landmark Detection];
D --> D1[F1 Score Visualization];
D --> D2[Sentiment Analysis];
style A fill:#f9f,stroke:#333,stroke-width:2px;
style B fill:#bbf,stroke:#333,stroke-width:2px;
style C fill:#ccf,stroke:#333,stroke-width:2px;
style D fill:#ccf,stroke:#333,stroke-width:2px;
style C1 fill:#dfd,stroke:#333,stroke-width:1px;
style D1 fill:#dfd,stroke:#333,stroke-width:1px;
style D2 fill:#dfd,stroke:#333,stroke-width:1px;
Each project within AI or Data Analysis is designed to be standalone, typically consisting of a Python script or a Jupyter Notebook along with any specific data or model files it requires.
This collection primarily leverages the following Python libraries and tools:
- Python: The core programming language (Python 3.8+).
- OpenCV (
opencv-python): For computer vision tasks, particularly webcam access and image processing in Facial Landmark Detection. - dlib: A powerful C++ library with Python bindings for machine learning, used for facial detection and landmark prediction.
- NumPy: Essential for numerical operations and array manipulation in data analysis and scientific computing.
- Pandas: For data manipulation and analysis, especially with tabular data like the Amazon reviews.
- Matplotlib: For creating static, interactive, and animated visualizations, including 3D plots.
- Seaborn: Built on Matplotlib, providing a high-level interface for drawing attractive statistical graphics.
- NLTK (Natural Language Toolkit): A leading platform for building Python programs to work with human language data, used for text preprocessing and VADER sentiment analysis.
- scikit-learn: For machine learning tasks, specifically used for evaluating model performance with confusion matrices and classification reports.
Follow these steps to set up the projects locally and start exploring.
Before you begin, ensure you have the following installed:
- Python 3.8+:
pip: Python's package installer (usually comes with Python).- C++ Build Tools (for dlib):
dlibrequires a C++ compiler.- Windows: Install Build Tools for Visual Studio. Select "Desktop development with C++" workload.
- macOS: Install Xcode Command Line Tools:
xcode-select --install. - Linux: Install
build-essential(Ubuntu/Debian) orDevelopment Tools(Fedora/RHEL):sudo apt-get install build-essentialorsudo yum groupinstall "Development Tools".
-
Clone the repository:
git clone https://github.com/GDSC-FSC/Python-Projects.git cd Python-Projects -
Create a virtual environment (highly recommended):
python -m venv venv
-
Activate the virtual environment:
- Windows:
.\venv\Scripts\activate
- macOS / Linux:
source venv/bin/activate
- Windows:
-
Install dependencies: The following
requirements.txtcovers all current projects.Click to view required packages
opencv-python>=4.5 dlib>=19.22 numpy>=1.20 pandas>=1.2 matplotlib>=3.3 seaborn>=0.11 nltk>=3.6 scikit-learn>=0.24 jupyter # Optional, for running notebookspip install opencv-python dlib numpy pandas matplotlib seaborn nltk scikit-learn jupyter
β οΈ Note fordlibinstallation: This step might take a while asdlibcompiles from source. Ensure you have the C++ build tools installed as mentioned in Prerequisites. -
Download NLTK data: Some NLTK components are not installed by default and need to be downloaded.
import nltk nltk.download('punkt') # For tokenization nltk.download('stopwords') # For stop word removal nltk.download('wordnet') # For lemmatization nltk.download('vader_lexicon') # For VADER sentiment analysis
π‘ You can run these commands directly in a Python interpreter after activating your virtual environment.
- Facial Landmark Detection:
The
facial_landmark.pyscript requires a pre-trainedshape_predictor_68_face_landmarks.datmodel file.- Download the model from
dlib's GitHub: shape_predictor_68_face_landmarks.dat.bz2 - Extract the
.bz2file to getshape_predictor_68_face_landmarks.dat. - Place this
.datfile in the same directory asfacial_landmark.py(i.e.,Advanced/AI/).
- Download the model from
Each project can be run independently. Ensure your virtual environment is activated (source venv/bin/activate).
- Python Scripts: Execute directly from the command line.
python <path/to/script.py>
- Jupyter Notebooks: Launch Jupyter Lab or Jupyter Notebook and open the
.ipynbfiles.Then, navigate to the respectivejupyter lab # or jupyter notebook.ipynbfile in your browser.
Here's how to run and interact with each project in this collection.
This script uses your webcam to detect faces and mark 68 key facial landmarks in real-time.
Click to view Facial Landmark Detection Workflow
graph TD
A[Start Application] --> B{Initialize OpenCV & dlib};
B --> C[Open Webcam Feed];
C --> D{Loop: Read Frame};
D -- If no frame --> E[Exit];
D -- If frame --> F[Convert to Grayscale];
F --> G[Detect Faces];
G -- For each face --> H[Predict Landmarks];
H --> I[Draw Landmarks on Frame];
I --> J[Display Frame];
J --> K{Wait for 'q' key or Window Close};
K -- 'q' pressed or closed --> E;
K -- Continue --> D;
E[Release Webcam & Destroy Windows] --> L[End];
- Ensure
shape_predictor_68_face_landmarks.datis in place (see Configuration). - Navigate to the AI directory:
cd Advanced/AI - Run the script:
python facial_landmark.py
- A window will appear displaying your webcam feed with detected facial landmarks.
- Press
qto quit the application.
This script generates a 3D plot visualizing the F1 score as a function of precision and recall.
- Navigate to the Data Analysis directory:
cd Advanced/Data analysis - Run the Jupyter Notebook:
If you have
jupyterinstalled, you can open the notebook.Alternatively, you can run the converted Python script (though it might just display and close the plot).jupyter lab Gradient.ipynb
When run as a Jupyter Notebook, the 3D plot will be rendered directly in the output cell, allowing for interactive viewing within the notebook environment.python Gradient.py
This project performs sentiment analysis on a dataset of Amazon reviews, showcasing text preprocessing, sentiment scoring, and evaluation.
Click to view Sentiment Analysis Workflow
graph TD
A[Start] --> B[Load Amazon Review Dataset (CSV)];
B --> C{For each reviewText};
C --> C1[Tokenize Text (lowercase)];
C1 --> C2[Remove Stop Words];
C2 --> C3[Lemmatize Tokens];
C3 --> D[Join Processed Tokens];
D --> E[Apply Preprocessing to DataFrame];
E --> F[Initialize VADER Sentiment Analyzer];
F --> G{For each processed reviewText};
G --> G1[Get Polarity Scores];
G1 --> G2[Determine Sentiment (Positive/Negative)];
G2 --> H[Add Sentiment Column to DataFrame];
H --> I[Compare Predicted vs. Actual Sentiment];
I --> J[Generate Confusion Matrix];
J --> K[Generate Classification Report];
K --> L[End];
- Ensure NLTK data is downloaded (see Installation).
- Navigate to the Data Analysis directory:
cd Advanced/Data analysis - Open and run the Jupyter Notebook:
Execute the cells sequentially. The notebook will:
jupyter lab SentimentAnalysis.ipynb
- Load the dataset directly from a URL.
- Preprocess the text data.
- Apply VADER sentiment analysis.
- Display the confusion matrix and classification report, evaluating the sentiment predictions against the 'Positive' column in the dataset.
This section outlines current limitations, any known bugs, and our vision for future enhancements.
dlibDependency: The Facial Landmark Detection project's reliance ondlibmakes installation complex on certain systems due to C++ compiler requirements.- Pre-trained Models: Facial landmark detection uses a generic pre-trained model, which might not perform optimally on highly unusual face orientations or low-resolution images.
- Sentiment Analysis Accuracy: While VADER is robust, it's a rule-based system and might misinterpret nuanced language, sarcasm, or highly domain-specific jargon that falls outside its lexicon.
- Dataset Specificity: The Sentiment Analysis is demonstrated on Amazon reviews; its direct applicability to other text domains might vary without fine-tuning or a different model.
- Project Isolation: While good for modularity, there's currently no unified entry point or GUI for the entire collection.
dlibInstallation Errors: Users often encounter build errors duringdlibinstallation, typically related to missing C++ build tools or incorrect compiler setup. Refer to Troubleshooting for common fixes.- NLTK Data Download Issues: Firewall restrictions or network problems can sometimes prevent
nltk.download()from completing successfully. - Webcam Access Issues: On some operating systems, users might need to explicitly grant Python or the terminal application permission to access the webcam.
We are continuously looking to expand and improve this collection. Planned enhancements include:
- Expanded AI Portfolio:
- π Add projects on object detection (e.g., using YOLO or SSD).
- π‘ Explore generative AI (e.g., text generation, image manipulation).
- π Implement deep learning models for facial recognition or emotion detection.
- Advanced Data Analysis:
- π Incorporate time series analysis projects.
- π Include examples of machine learning model building and deployment.
- π Develop more interactive data visualization dashboards.
- Deployment & Containerization:
- β Provide Dockerfiles for each project to simplify environment setup and deployment.
- π‘ Explore cloud deployment options (e.g., Heroku, AWS Lambda) for selected projects.
- Improved User Experience:
- β Create a simple web interface (e.g., with Flask/Django) for some projects.
- π‘ Develop a consolidated CLI tool or launcher for easier navigation and execution of projects.
- Comprehensive Documentation:
- Expand existing project documentation with more detailed explanations and theoretical backgrounds.
- Add video tutorials for complex setups.
- New Categories:
- Explore adding projects in areas like web scraping, automation, or game development.
We welcome contributions from the community to help grow and improve this collection! Whether it's a new project, an enhancement to an existing one, bug fixes, or documentation improvements, your help is appreciated.
- Fork the repository.
- Create a new branch for your feature or bug fix:
git checkout -b feature/your-feature-nameorbugfix/issue-description. - Implement your changes.
- For new projects, create a new directory under
Advanced/(or a new top-level category if appropriate). - Ensure your project includes a small, self-contained
README.mdexplaining its purpose, setup, and usage.
- For new projects, create a new directory under
- Write clear, concise commit messages.
- Push your branch to your forked repository.
- Open a Pull Request (PR) to the
mainbranch of this repository.
- Branch Naming: Use descriptive names (e.g.,
feat/add-object-detection,fix/dlib-install-error,docs/update-readme). - Pull Request Description: Provide a clear description of your changes, why they were made, and any relevant context. Reference any issues it closes (e.g.,
Closes #123). - One Feature/Fix per PR: Keep PRs focused to make reviews easier.
- Code Style: Adhere to PEP 8 for Python code.
- Docstrings: Add comprehensive docstrings to functions, classes, and modules.
- Comments: Use comments where necessary to explain complex logic.
- Testing: While formal testing frameworks are not mandated for all simple scripts, ensure your code is well-tested manually and robust for its intended purpose.
This project is licensed under the MIT License. See the LICENSE file for full details.
We extend our gratitude to:
- GDSC-FSC (Google Developer Student Club - Farmingdale State College) for initiating and supporting this project.
- DataCamp for inspiring the Sentiment Analysis project (original source: Text Analytics for Beginners with NLTK).
- The developers of OpenCV, dlib, NumPy, Pandas, Matplotlib, Seaborn, NLTK, and scikit-learn for their incredible open-source libraries.
For questions, suggestions, or collaborations, please reach out via:
- GitHub Issues: Open an issue
- GDSC-FSC Community: Connect with us through our official GDSC channels (specific links will be provided by GDSC-FSC).
- v1.0.0 (October 26, 2023)
- Initial release of the Python Projects Collection.
- Includes Facial Landmark Detection, F1 Score Visualization, and Sentiment Analysis projects.
- Comprehensive README documentation published.
Q: What are the primary goals of this project collection?
A: The main goals are to provide practical Python examples in AI and data analysis, serve as a learning resource, and showcase the capabilities of Python for advanced applications within the GDSC-FSC community.Q: Can I suggest a new project idea?
A: Absolutely! We welcome new ideas. Please open an issue on GitHub with the label `feature-request` and describe your idea.Q: How can I ensure my dlib installation succeeds?
A: Ensure you have the correct C++ build tools installed for your operating system as specified in the [Prerequisites](#prerequisites) section. Sometimes, updating `pip` and `setuptools` beforehand can also help: `pip install --upgrade pip setuptools`.-
dlibInstallation Failure:- Error Message: "Microsoft Visual C++ 14.0 or greater is required." (Windows) or "command 'gcc' failed with exit status 1" (Linux/macOS).
- Solution: This indicates missing C++ build tools. Refer to the Prerequisites section for instructions specific to your OS. On Windows, ensure you select "Desktop development with C++" when installing Visual Studio Build Tools.
- Recommendation: Try installing
cmakefirst:pip install cmake. Then retrypip install dlib.
-
NLTK Data Download Issues:
- Error Message:
LookupErrorfor 'punkt', 'stopwords', etc. - Solution: Ensure you are connected to the internet and that no firewall is blocking the download. Try running the
nltk.download()commands again. If issues persist, you might need to manually download the data by finding the NLTK data directory (usually~/nltk_dataor a path printed duringnltk.download()) and placing the files there.
- Error Message:
-
Webcam Not Found/Accessed:
- Error Message:
cv2.error: OpenCV(4.x.x) ... camera failed to open. - Solution:
- Ensure no other application is using your webcam.
- Check your operating system's privacy settings to ensure the terminal/IDE has permission to access the camera.
- For virtual environments, sometimes installing
opencv-python-headlesscan resolve issues in server environments, but for local useopencv-pythonshould suffice.
- Error Message:
-
Jupyter Notebook Not Launching/Finding Kernel:
- Error Message: "Kernel not found" or "No module named 'ipykernel'".
- Solution: Ensure
jupyterandipykernelare installed within your active virtual environment:pip install jupyter ipykernel.