From 0df92c5fb84ec2f60b25ae2d8943483af21f75f4 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 31 May 2025 20:24:59 +0000 Subject: [PATCH 1/3] Initial plan for issue From b1825ba02b8267078679b63293353bb48c4fe740 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 31 May 2025 20:29:12 +0000 Subject: [PATCH 2/3] Comprehensive README update with latest API changes and deprecation warnings Co-authored-by: lumensparkxy <12463711+lumensparkxy@users.noreply.github.com> --- README.md | 183 +++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 166 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 1702d3f..c30f6d5 100644 --- a/README.md +++ b/README.md @@ -2,14 +2,28 @@ This repository contains a collection of Python scripts and Jupyter notebooks for data processing, automation, and integration with external APIs. The scripts cover tasks such as merging CSV files, adding columns to CSVs, automating Google searches, and posting tweets using OpenAI GPT models. +> **⚠️ Important Note**: Some scripts in this repository use deprecated libraries and APIs. Please see the [Known Issues](#known-issues) section below for important updates and alternatives. + ## Contents ### Python Scripts -- **merge_csv_files_from_folder.py**: Merges all CSV files from a specified directory (landing zone) into a master CSV file. It checks for column consistency and deletes processed files. **Note:** Paths are hardcoded; update them as needed. -- **add_column_name_of_file.py**: Adds a new column to each CSV file in a directory, with the value set to the file name (useful for tracking file origins when merging). Takes 3 command-line arguments: directory, file extension, and new column name. -- **google_search_using_selenium.py**: Automates Google searches using Selenium WebDriver, extracts result links, and writes them to a file. Requires ChromeDriver and a file named `xxx.txt` with search queries. -- **openai_gpt_tweet_pro_tips.py**: Uses OpenAI's GPT-3.5-turbo to generate Python programming tips and posts them to Twitter. Requires Twitter and OpenAI API credentials set as environment variables. +- **merge_csv_files_from_folder.py**: Merges all CSV files from a specified directory (landing zone) into a master CSV file. It checks for column consistency and deletes processed files. + - ⚠️ **Important**: Contains hardcoded paths that must be updated before use + - Automatically removes source files after processing + +- **add_column_name_of_file.py**: Adds a new column to each CSV file in a directory, with the value set to the file name (useful for tracking file origins when merging). + - Takes 3 command-line arguments: ` ` + - ⚠️ **Important**: Contains hardcoded landing zone path + +- **google_search_using_selenium.py**: Automates Google searches using Selenium WebDriver, extracts top 4 result links per query, and writes them to `SearchLinks.txt`. + - Requires ChromeDriver installation and proper path configuration + - Reads search queries from `xxx.txt` file (one per line) + - ⚠️ **Important**: Uses deprecated Selenium methods that may not work with newer versions + +- **openai_gpt_tweet_pro_tips.py**: Uses OpenAI's GPT-3.5-turbo to generate Python programming tips and posts them to Twitter automatically. + - Requires both Twitter and OpenAI API credentials as environment variables + - ⚠️ **Important**: Uses deprecated OpenAI API and Twitter library ### Jupyter Notebooks @@ -19,35 +33,170 @@ This repository contains a collection of Python scripts and Jupyter notebooks fo ## Requirements -Install dependencies with: +- **Python 3.8+** is recommended (minimum 3.7) +- Install dependencies with: ```bash pip install -r requirements.txt ``` -### requirements.txt -- pandas -- numpy -- openai -- twitter -- selenium +### Core Dependencies +- **pandas** - Data manipulation and analysis +- **numpy** - Numerical computing +- **openai** - OpenAI API client (⚠️ See [Known Issues](#known-issues)) +- **twitter** - Twitter API client (⚠️ Deprecated - see alternatives below) +- **selenium** - Web browser automation (⚠️ Uses deprecated methods) + +### Alternative Dependencies (Recommended) +For new projects, consider these modern alternatives: +```bash +# For Twitter/X integration +pip install tweepy + +# For OpenAI (ensure compatibility with latest API) +pip install openai>=1.0.0 + +# For Selenium with modern syntax +pip install selenium>=4.0.0 +``` ## Setup & Preconditions -- **Python 3.7+** is recommended. +- **Python 3.8+** is recommended for best compatibility. - For scripts using Selenium, [ChromeDriver](https://sites.google.com/a/chromium.org/chromedriver/) must be installed and its path set in the script. - For Twitter and OpenAI integration, set the following environment variables: - `TWITTER_CONSUMER_KEY`, `TWITTER_CONSUMER_SECRET`, `TWITTER_ACCESS_TOKEN_KEY`, `TWITTER_ACCESS_TOKEN_SECRET` - `OPENAI_API_KEY` -- Some scripts have hardcoded file paths (e.g., `/Users/mbp/Python/appendcolumn`). Update these paths to match your environment. +- **Important**: Scripts contain hardcoded file paths (e.g., `/Users/mbp/Python/appendcolumn`, `/Users/mbp/Python/WIKIPEDIA`). You **must** update these paths to match your environment before running. - For `google_search_using_selenium.py`, ensure `xxx.txt` exists with search queries (one per line). +### Environment Setup Example +```bash +# Set OpenAI API key +export OPENAI_API_KEY="your_openai_api_key_here" + +# Set Twitter API credentials (if using Twitter integration) +export TWITTER_CONSUMER_KEY="your_consumer_key" +export TWITTER_CONSUMER_SECRET="your_consumer_secret" +export TWITTER_ACCESS_TOKEN_KEY="your_access_token" +export TWITTER_ACCESS_TOKEN_SECRET="your_access_token_secret" +``` + ## Usage -- **merge_csv_files_from_folder.py**: Place CSV files in the landing zone directory. Run the script to merge them into the master file. -- **add_column_name_of_file.py**: Run with arguments: ` ` -- **google_search_using_selenium.py**: Edit the script for your ChromeDriver path and input file. Run to collect search result links. -- **openai_gpt_tweet_pro_tips.py**: Ensure environment variables are set. Run to post a GPT-generated tip to Twitter. +### Python Scripts + +- **merge_csv_files_from_folder.py**: + ```bash + python merge_csv_files_from_folder.py + ``` + Place CSV files in the landing zone directory. Run the script to merge them into the master file. **Note**: Update hardcoded paths in the script before use. + +- **add_column_name_of_file.py**: + ```bash + python add_column_name_of_file.py + ``` + Example: `python add_column_name_of_file.py ./data .csv source_file` + +- **google_search_using_selenium.py**: + ```bash + python google_search_using_selenium.py + ``` + Edit the script for your ChromeDriver path and ensure `xxx.txt` exists with search queries. Run to collect search result links. **Note**: Uses deprecated Selenium methods. + +- **openai_gpt_tweet_pro_tips.py**: + ```bash + python openai_gpt_tweet_pro_tips.py + ``` + Ensure environment variables are set. Run to post a GPT-generated tip to Twitter. **Note**: Uses deprecated OpenAI API methods and Twitter library. + +### Jupyter Notebooks + +Open any notebook in Jupyter Lab or Google Colab: +```bash +jupyter lab pandas_001_10_minutes_to_pandas.ipynb +``` + +## Known Issues + +### 1. OpenAI API Deprecation (openai_gpt_tweet_pro_tips.py) +**Issue**: The script uses deprecated `openai.ChatCompletion.create()` method. + +**Current code**: +```python +response = openai.ChatCompletion.create(model="gpt-3.5-turbo", ...) +``` + +**Modern replacement**: +```python +from openai import OpenAI +client = OpenAI() +response = client.chat.completions.create(model="gpt-3.5-turbo", ...) +``` + +### 2. Selenium WebDriver Deprecation (google_search_using_selenium.py) +**Issue**: Uses deprecated `find_element_by_*()` methods. + +**Current code**: +```python +elem = browser.find_element_by_name('q') +elem = browser.find_element_by_xpath(x_path) +``` + +**Modern replacement**: +```python +from selenium.webdriver.common.by import By +elem = browser.find_element(By.NAME, 'q') +elem = browser.find_element(By.XPATH, x_path) +``` + +### 3. Twitter API Changes +**Issue**: The `twitter` library is deprecated and Twitter/X API has significant changes. + +**Modern alternatives**: +- Use [tweepy](https://docs.tweepy.org/) for Twitter/X API v2 +- Update API credentials and endpoints according to current Twitter/X developer documentation + +### 4. Hardcoded File Paths +**Issue**: Scripts contain hardcoded paths that need manual updating. + +**Files affected**: +- `merge_csv_files_from_folder.py`: Lines 10-11 +- `add_column_name_of_file.py`: Line 7 +- `google_search_using_selenium.py`: Line 10 (ChromeDriver path) + +**Solution**: Update these paths to match your local environment before running. + +### 5. ChromeDriver Management +**Issue**: Manual ChromeDriver path specification. + +**Modern alternative**: +```python +from selenium import webdriver +from selenium.webdriver.chrome.service import Service +from webdriver_manager.chrome import ChromeDriverManager + +# Automatically manage ChromeDriver +service = Service(ChromeDriverManager().install()) +driver = webdriver.Chrome(service=service) +``` + +## Troubleshooting + +### Common Installation Issues +1. **OpenAI version conflicts**: Ensure you're using a compatible version + ```bash + pip install openai==0.28.1 # For old API syntax + # OR + pip install openai>=1.0.0 # For new API syntax (requires code updates) + ``` + +2. **Selenium compatibility**: For modern Selenium: + ```bash + pip install selenium>=4.0.0 webdriver-manager + ``` + +3. **Twitter API access**: Twitter/X now requires approved developer accounts for API access. ## Contact From c5bb605dc8114cb60098d0e1c4d8905ef87cd50c Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 31 May 2025 20:29:56 +0000 Subject: [PATCH 3/3] Add deprecation warnings to requirements.txt Co-authored-by: lumensparkxy <12463711+lumensparkxy@users.noreply.github.com> --- requirements.txt | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/requirements.txt b/requirements.txt index a211876..8726d48 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,5 +1,8 @@ +# Core dependencies for Python Data Processing and Automation Scripts +# WARNING: Some libraries below use deprecated APIs. See README.md "Known Issues" section. + pandas numpy -openai -twitter -selenium +openai # WARNING: Scripts use deprecated API - see README for modern alternatives +twitter # WARNING: Deprecated library - consider tweepy for new projects +selenium # WARNING: Scripts use deprecated methods - see README for modern syntax