Python Data Processing and Automation Scripts

This repository contains a collection of Python scripts and Jupyter notebooks for data processing, automation, and integration with external APIs. The scripts cover tasks such as merging CSV files, adding columns to CSVs, automating Google searches, and posting tweets using OpenAI GPT models.

⚠️ Important Note: Some scripts in this repository use deprecated libraries and APIs. Please see the Known Issues section below for important updates and alternatives.

merge_csv_files_from_folder.py: Merges all CSV files from a specified directory (landing zone) into a master CSV file. It checks for column consistency and deletes processed files.
- ⚠️ Important: Contains hardcoded paths that must be updated before use
- Automatically removes source files after processing
add_column_name_of_file.py: Adds a new column to each CSV file in a directory, with the value set to the file name (useful for tracking file origins when merging).
- Takes 3 command-line arguments: <directory> <file_extension> <new_column_name>
- ⚠️ Important: Contains hardcoded landing zone path
google_search_using_selenium.py: Automates Google searches using Selenium WebDriver, extracts top 4 result links per query, and writes them to SearchLinks.txt.
- Requires ChromeDriver installation and proper path configuration
- Reads search queries from xxx.txt file (one per line)
- ⚠️ Important: Uses deprecated Selenium methods that may not work with newer versions
openai_gpt_tweet_pro_tips.py: Uses OpenAI's GPT-3.5-turbo to generate Python programming tips and posts them to Twitter automatically.
- Requires both Twitter and OpenAI API credentials as environment variables
- ⚠️ Important: Uses deprecated OpenAI API and Twitter library

Jupyter Notebooks

pandas_001_10_minutes_to_pandas.ipynb: Quick introduction to pandas, covering object creation, data viewing, selection, operations, merging, grouping, reshaping, and plotting.
pandas_002_Intro_to_data_structures.ipynb: Explains pandas Series and DataFrame data structures, creation, and indexing.
pandas_003_Essential_basic_functionality.ipynb: Demonstrates essential pandas functionality, including working with Series/DataFrames, indexing, and basic operations.

Requirements

Python 3.8+ is recommended (minimum 3.7)
Install dependencies with:

pip install -r requirements.txt

Core Dependencies

pandas - Data manipulation and analysis
numpy - Numerical computing
openai - OpenAI API client (⚠️ See Known Issues)
twitter - Twitter API client (⚠️ Deprecated - see alternatives below)
selenium - Web browser automation (⚠️ Uses deprecated methods)

Alternative Dependencies (Recommended)

For new projects, consider these modern alternatives:

# For Twitter/X integration
pip install tweepy

# For OpenAI (ensure compatibility with latest API)
pip install openai>=1.0.0

# For Selenium with modern syntax
pip install selenium>=4.0.0

Setup & Preconditions

Python 3.8+ is recommended for best compatibility.
For scripts using Selenium, ChromeDriver must be installed and its path set in the script.
For Twitter and OpenAI integration, set the following environment variables:
- TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET, TWITTER_ACCESS_TOKEN_KEY, TWITTER_ACCESS_TOKEN_SECRET
- OPENAI_API_KEY
Important: Scripts contain hardcoded file paths (e.g., /Users/mbp/Python/appendcolumn, /Users/mbp/Python/WIKIPEDIA). You must update these paths to match your environment before running.
For google_search_using_selenium.py, ensure xxx.txt exists with search queries (one per line).

Environment Setup Example

# Set OpenAI API key
export OPENAI_API_KEY="your_openai_api_key_here"

# Set Twitter API credentials (if using Twitter integration)
export TWITTER_CONSUMER_KEY="your_consumer_key"
export TWITTER_CONSUMER_SECRET="your_consumer_secret"
export TWITTER_ACCESS_TOKEN_KEY="your_access_token"
export TWITTER_ACCESS_TOKEN_SECRET="your_access_token_secret"

Usage

Python Scripts

merge_csv_files_from_folder.py:
```
python merge_csv_files_from_folder.py
```
Place CSV files in the landing zone directory. Run the script to merge them into the master file. Note: Update hardcoded paths in the script before use.
add_column_name_of_file.py:
```
python add_column_name_of_file.py <landing_zone_dir> <file_extension> <new_column_name>
```
Example: python add_column_name_of_file.py ./data .csv source_file
google_search_using_selenium.py:
```
python google_search_using_selenium.py
```
Edit the script for your ChromeDriver path and ensure xxx.txt exists with search queries. Run to collect search result links. Note: Uses deprecated Selenium methods.
openai_gpt_tweet_pro_tips.py:
```
python openai_gpt_tweet_pro_tips.py
```
Ensure environment variables are set. Run to post a GPT-generated tip to Twitter. Note: Uses deprecated OpenAI API methods and Twitter library.

Jupyter Notebooks

Open any notebook in Jupyter Lab or Google Colab:

jupyter lab pandas_001_10_minutes_to_pandas.ipynb

Known Issues

1. OpenAI API Deprecation (openai_gpt_tweet_pro_tips.py)

Issue: The script uses deprecated openai.ChatCompletion.create() method.

Current code:

response = openai.ChatCompletion.create(model="gpt-3.5-turbo", ...)

Modern replacement:

from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(model="gpt-3.5-turbo", ...)

2. Selenium WebDriver Deprecation (google_search_using_selenium.py)

Issue: Uses deprecated find_element_by_*() methods.

Current code:

elem = browser.find_element_by_name('q')
elem = browser.find_element_by_xpath(x_path)

Modern replacement:

from selenium.webdriver.common.by import By
elem = browser.find_element(By.NAME, 'q')
elem = browser.find_element(By.XPATH, x_path)

3. Twitter API Changes

Issue: The twitter library is deprecated and Twitter/X API has significant changes.

Modern alternatives:

Use tweepy for Twitter/X API v2
Update API credentials and endpoints according to current Twitter/X developer documentation

4. Hardcoded File Paths

Issue: Scripts contain hardcoded paths that need manual updating.

Files affected:

merge_csv_files_from_folder.py: Lines 10-11
add_column_name_of_file.py: Line 7
google_search_using_selenium.py: Line 10 (ChromeDriver path)

Solution: Update these paths to match your local environment before running.

5. ChromeDriver Management

Issue: Manual ChromeDriver path specification.

Modern alternative:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Automatically manage ChromeDriver
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

Troubleshooting

Common Installation Issues

OpenAI version conflicts: Ensure you're using a compatible version

pip install openai==0.28.1  # For old API syntax
# OR
pip install openai>=1.0.0   # For new API syntax (requires code updates)

Selenium compatibility: For modern Selenium:

pip install selenium>=4.0.0 webdriver-manager

Twitter API access: Twitter/X now requires approved developer accounts for API access.

Contact

For suggestions and comments, please contact: vivek@maswadkar.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Data Processing and Automation Scripts

Contents

Python Scripts

Jupyter Notebooks

Requirements

Core Dependencies

Alternative Dependencies (Recommended)

Setup & Preconditions

Environment Setup Example

Usage

Python Scripts

Jupyter Notebooks

Known Issues

1. OpenAI API Deprecation (openai_gpt_tweet_pro_tips.py)

2. Selenium WebDriver Deprecation (google_search_using_selenium.py)

3. Twitter API Changes

4. Hardcoded File Paths

5. ChromeDriver Management

Troubleshooting

Common Installation Issues

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.idea/dictionaries		.idea/dictionaries
DataFrame_from_records.py		DataFrame_from_records.py
README.md		README.md
add_column_name_of_file.py		add_column_name_of_file.py
classification.ipynb		classification.ipynb
google_search_using_selenium.py		google_search_using_selenium.py
kedro.py		kedro.py
merge_csv_files_from_folder.py		merge_csv_files_from_folder.py
openai_gpt_tweet_pro_tips.py		openai_gpt_tweet_pro_tips.py
pandas_001_10_minutes_to_pandas.ipynb		pandas_001_10_minutes_to_pandas.ipynb
pandas_002_Intro_to_data_structures.ipynb		pandas_002_Intro_to_data_structures.ipynb
pandas_003_Essential_basic_functionality.ipynb		pandas_003_Essential_basic_functionality.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Python Data Processing and Automation Scripts

Contents

Python Scripts

Jupyter Notebooks

Requirements

Core Dependencies

Alternative Dependencies (Recommended)

Setup & Preconditions

Environment Setup Example

Usage

Python Scripts

Jupyter Notebooks

Known Issues

1. OpenAI API Deprecation (openai_gpt_tweet_pro_tips.py)

2. Selenium WebDriver Deprecation (google_search_using_selenium.py)

3. Twitter API Changes

4. Hardcoded File Paths

5. ChromeDriver Management

Troubleshooting

Common Installation Issues

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages