A comprehensive Python-based SEO research toolkit powered by HasData API. This collection of tools helps SEO professionals conduct keyword research, competitive analysis, SERP intelligence, and content gap analysis at scale.
- SEO Research Toolkit
- Features
- Installation
- Usage
- Configuration
- Output Examples
- Troubleshooting
- Advanced Workflows
- Acknowledgments
- Support
Extract thousands of long-tail keyword variations using Google's autocomplete API.
- Use Case: Discover untapped keyword opportunities
- Method: Recursive alphabetical expansion (depth configurable)
- Output: CSV file with unique suggestions
- Speed: Concurrent processing with configurable workers
Identify rising search trends and high-volume queries before competitors.
- Use Case: Spot emerging topics and seasonal opportunities
- Data Source: Google Trends API
- Output: Rising queries (growth rate) + Top queries (volume)
- Metrics: Growth indicators and search index values
Build hierarchical question trees from "People Also Ask" boxes.
- Use Case: Map topic authority and content cluster opportunities
- Method: Recursive question discovery
- Output: Nested topic structure
- Depth: Configurable recursion levels
Automatically classify search intent by analyzing SERP composition.
- Use Case: Understand what type of content ranks
- Analysis: URL pattern recognition (blog, product, forum, video, etc.)
- Output: Strategic content recommendations
- Metrics: SERP composition breakdown by content type
Measure keyword cannibalization and SERP overlap using Jaccard Index.
- Use Case: Identify clustering opportunities and keyword conflicts
- Method: URL set intersection analysis
- Output: Interactive heatmap + common URL frequency table
- Visualization: Seaborn-powered similarity matrix
Find missing keywords and phrases compared to ranking competitors.
- Use Case: Optimize existing content for better rankings
- Method: N-gram frequency analysis (1, 2, and 3-grams)
- Data Source: Trafilatura-based content extraction
- Output: Gap report with competitor coverage metrics
Track domain visibility in Google's AI-generated search summaries.
- Use Case: Monitor brand presence in AI-powered search
- Tracking: Citation index and URL detection
- Output: Coverage report with share-of-voice metrics
- Metrics: AI trigger rate and citation frequency
- Python 3.11 or higher
- HasData API key (Get one here)
- Clone the repository
git clone https://github.com/yourusername/seo-research-toolkit.git
cd seo-research-toolkit- Install dependencies
pip install -r requirements.txt- Configure API key (choose one method):
Option A: Environment variable
export HASDATA_API_KEY="your_api_key_here"Option B: Configuration file
echo "your_api_key_here" > .hasdata_configOption C: Interactive setup
python seo_manager.py
# Select option [8] to configure
β οΈ IMPORTANT: CONFIGURATION REQUIRED BEFORE USEAll tools in this toolkit are managed via the central script
seo_manager.py.If you run a tool without configuring it first, the manager will silently use default placeholder values, which are unlikely to match your real intent.
To avoid misleading results, you should always configure:
- [8] Configure Tool Settings - keywords, domains, geo, depth, limits, etc. for each tool
- [9] Configure API Key - your HasData API key
β οΈ No validation error is thrown when defaults are used. Always review and set your parameters before running any tool.
python seo_manager.pyThis launches a menu-driven interface where you can select and run any tool.
# Run specific tool by number
python seo_manager.py 1 # Google Suggest Harvester
python seo_manager.py 2 # Trends Analyzer
# ... etcEach tool can also be run independently:
python google_suggest_harvester.py
python trends_breakout_analyzer.py
python paa_tree_builder.py
python serp_intent_classifier.py
python serp_similarity_matrix.py
python content_gap_analyzer.py
python ai_overview_monitor.pyEach tool has configurable parameters at the top of the script:
Google Suggest Harvester
BASE_KEYWORD = "coffee"
MAX_DEPTH = 2 # 1 = a-z, 2 = aa-zz
MAX_WORKERS = 15 # Concurrent requests (check your plan limits)Trends Breakout Analyzer
SEED_TOPIC = "Coffee"
date = "now 7-d" # Time range: now 1-d, now 7-d, today 12-m, etc.
geo = "US" # Country codePAA Tree Builder
ROOT_KEYWORD = "coffee"
MAX_DEPTH = 2 # Recursion levelsSERP Intent Classifier
KEYWORD = "instant coffee"
deviceType = "desktop" # or "mobile"SERP Similarity Matrix
KEYWORDS = ["keyword1", "keyword2", ...] # List of related termsContent Gap Analyzer
TARGET_KEYWORD = "health benefits of decaf coffee"
MY_URL = "https://example.com/your-article"
TOP_N_COMPETITORS = 10AI Overview Monitor
TARGET_DOMAIN = "webmd.com"
KEYWORDS = ["keyword1", "keyword2", ...]Finished in 45.23 seconds. Average Speed: 12.34 req/s
Done. Collected 1847 unique keywords.
Saved to long_tail_keywords_hasdata.csv
--- Rising Queries (The Opportunity) ---
[Growth: Breakout] mushroom coffee benefits
[Growth: +450%] decaf coffee health
...
- coffee
- What are the health benefits of coffee?
- Is coffee good for your heart?
- Does coffee help with weight loss?
Dominant Type: Informational (Blog) (60.0%)
Action: Create a long-form Guide or Blog Post.
More output examples in our article: Python for SEO
"No trend data found"
- Topic may be too niche or misspelled
- Try broader keywords or different geo-targeting
"AI Overview not triggered"
- AI Overviews are region-specific (US has highest coverage)
- Try different device types: desktop vs mobile
Content extraction returns empty text
- Enable JS rendering:
jsRendering: True
1. Trends Breakout Analyzer β Find rising topics
2. Google Suggest Harvester β Extract long-tail variations
3. PAA Tree Builder β Map content structure
4. SERP Intent Classifier β Determine content type
1. SERP Similarity Matrix β Group related keywords
2. Content Gap Analyzer β Identify missing topics
3. AI Overview Monitor β Track visibility changes
1. SERP Intent Classifier β Analyze competitor strategies
2. Content Gap Analyzer β Reverse-engineer top pages
3. SERP Similarity Matrix β Find unique positioning opportunities
- HasData API for providing reliable SERP and proxy infrastructure
- Trafilatura for content extraction
- scikit-learn for NLP capabilities
- Documentation: HasData API Docs
- Full Article: Python for SEO
Made with β by SEO Professionals, for SEO Professionals

