Last updated: 2025-12-20 Maintained by Webscout
Webscout's GitAPI module provides a powerful, lightweight GitHub data extraction toolkit within the Webscout Python package. It offers comprehensive tools for retrieving GitHub repository, user, organization, gist, and trending data without requiring authentication for public data access.
- Core Components
- Features
- Installation
- Quick Examples
- Available Classes
- Error Handling
- Integration Guide
The main GitHub API module that provides comprehensive GitHub data extraction capabilities.
from webscout.Extra.GitToolkit.gitapi import Repository, User, GitSearch, Gist, Organization, Trending
# Initialize and use GitHub data extraction
repo = Repository("OE-LUCIFER", "Webscout")
info = repo.get_info()
print(f"Stars: {info['stargazers_count']}")Key Features:
- Repository operations (metadata, commits, issues, releases)
- User operations (profile, repositories, followers)
- Search operations (repositories, users, topics)
- Gist operations (public gists, specific gist retrieval)
- Organization operations (details, repositories)
- Trending data (repositories, developers)
- Comprehensive error handling
- Repository metadata, README, license, topics
- Commit history and comparison
- Pull requests and issues
- Releases, branches, and tags
- Contributors, stargazers, watchers, forks
- GitHub Actions workflows
- Repository events and statistics
- Profile information and social accounts
- Repository listing and starred repos
- Follower/Following data
- Organizations and packages
- Events, gists, and SSH/GPG keys
- Search repositories by query, language, stars
- Search users by name, location, followers
- Search topics, commits, issues, and labels
- Get gist by ID with full content
- List public gists
- Gist commits, forks, and revisions
- Gist comments
- Organization details and metadata
- Organization repositories
- Public members and events
- Trending repositories by language/time
- Trending developers
- Rate limit detection
- Resource not found handling
- Request retry mechanism
- Custom error types
Install as part of the Webscout package:
pip install webscoutOr install with UV package manager:
uv add webscoutfrom webscout.Extra.GitToolkit.gitapi import Repository
repo = Repository("OE-LUCIFER", "Webscout")
# Get basic info
info = repo.get_info()
print(f"Stars: {info['stargazers_count']}")
# Get README
readme = repo.get_readme()
print(f"README: {readme['name']}")
# Get topics
topics = repo.get_topics()
print(f"Topics: {topics['names']}")
# Compare branches
diff = repo.compare("main", "dev")
print(f"Commits ahead: {diff['ahead_by']}")from webscout.Extra.GitToolkit.gitapi import User
user = User("OE-LUCIFER")
# Get profile
profile = user.get_profile()
print(f"Followers: {profile['followers']}")
# Get social accounts
socials = user.get_social_accounts()
for account in socials:
print(f"{account['provider']}: {account['url']}")from webscout.Extra.GitToolkit.gitapi import GitSearch
search = GitSearch()
# Search repositories
repos = search.search_repositories("webscout language:python stars:>100")
print(f"Found {repos['total_count']} repos")
# Search users
users = search.search_users("location:india followers:>1000")
print(f"Found {users['total_count']} users")
# Search topics
topics = search.search_topics("machine-learning")
for topic in topics['items'][:5]:
print(f"Topic: {topic['name']}")from webscout.Extra.GitToolkit.gitapi import Gist
gist = Gist()
# List public gists
public = gist.list_public(per_page=5)
for g in public:
print(f"Gist: {g['id']} - {g['description']}")
# Get specific gist
data = gist.get("gist_id_here")
print(f"Files: {list(data['files'].keys())}")from webscout.Extra.GitToolkit.gitapi import Organization
org = Organization("microsoft")
# Get org info
info = org.get_info()
print(f"Organization: {info['name']}")
print(f"Public repos: {info['public_repos']}")
# Get org repos
repos = org.get_repos(per_page=10)
for repo in repos:
print(f"Repo: {repo['name']}")from webscout.Extra.GitToolkit.gitapi import Trending
trending = Trending()
# Get trending repos
repos = trending.get_repositories(language="python", since="weekly")
for repo in repos[:5]:
print(f"{repo['full_name']} - ⭐ {repo['stars']}")
# Get trending developers
devs = trending.get_developers(language="python")
for dev in devs[:5]:
print(f"{dev['username']} - {dev.get('name', 'N/A')}")The Repository class provides comprehensive access to GitHub repository data.
Methods:
get_info()- Get basic repository informationget_readme()- Get repository README contentget_license()- Get repository license informationget_topics()- Get repository topicsget_commits(sha="main", per_page=30)- Get commit historyget_commit(sha)- Get specific commit detailscompare(base, head)- Compare branchesget_pull_requests(state="open", per_page=30)- Get pull requestsget_issues(state="open", per_page=30)- Get issuesget_labels(per_page=30)- Get labelsget_milestones(state="open", per_page=30)- Get milestonesget_releases(per_page=30)- Get releasesget_branches(per_page=30)- Get branchesget_tags(per_page=30)- Get tagsget_contributors(per_page=30)- Get contributorsget_stargazers(per_page=30)- Get stargazersget_watchers(per_page=30)- Get watchersget_forks(per_page=30)- Get forksget_contents(path="", ref="main")- Get repository contentsget_languages()- Get repository languagesget_events(per_page=30)- Get repository eventsget_workflows(per_page=30)- Get GitHub Actions workflowsget_workflow_runs(workflow_id, per_page=30)- Get workflow runsget_deployments(per_page=30)- Get deploymentsget_community_profile()- Get community profileget_code_frequency()- Get code frequency statisticsget_commit_activity()- Get commit activity statistics
The User class provides access to GitHub user data.
Methods:
get_profile()- Get user profile informationget_repositories(type="owner", per_page=30)- Get user repositoriesget_starred(per_page=30)- Get starred repositoriesget_followers(per_page=30)- Get followersget_following(per_page=30)- Get followingget_gists(per_page=30)- Get user gistsget_organizations(per_page=30)- Get organizationsget_public_events(per_page=30)- Get public eventsget_received_events(per_page=30)- Get received eventsget_keys(per_page=30)- Get SSH keysget_gpg_keys(per_page=30)- Get GPG keysget_social_accounts()- Get social accountsget_packages(per_page=30)- Get packages
The GitSearch class provides GitHub search functionality.
Methods:
search_repositories(query, per_page=30)- Search repositories by querysearch_users(query, per_page=30)- Search userssearch_topics(query, per_page=30)- Search topicssearch_commits(query, per_page=30)- Search commitssearch_issues(query, per_page=30)- Search issues/PRssearch_labels(query, per_page=30)- Search labels
The Gist class provides access to GitHub gists.
Methods:
get(gist_id)- Get gist by IDlist_public(per_page=30)- List public gistslist_for_user(username, per_page=30)- List user's gistsget_commits(gist_id, per_page=30)- Get gist commitsget_forks(gist_id, per_page=30)- Get gist forksget_revision(gist_id, sha)- Get gist revisionget_comments(gist_id, per_page=30)- Get gist comments
The Organization class provides access to GitHub organization data.
Methods:
get_info()- Get organization detailsget_repos(type="all", per_page=30)- Get organization repositoriesget_public_members(per_page=30)- Get public membersget_events(per_page=30)- Get organization events
The Trending class provides access to GitHub trending data.
Methods:
get_repositories(language=None, since="daily", spoken_language_code=None)- Get trending repositoriesget_developers(language=None, since="daily")- Get trending developers
GitAPI includes comprehensive error handling for GitHub API interactions.
from webscout.Extra.GitToolkit.gitapi import Repository, NotFoundError, RateLimitError
try:
repo = Repository("nonexistent", "repo")
info = repo.get_info()
except NotFoundError:
print("Repository not found")
except RateLimitError:
print("Rate limit exceeded, try again later")
except Exception as e:
print(f"An error occurred: {e}")| Exception | Description |
|---|---|
GitError |
Base exception for all GitHub API errors |
RateLimitError |
Raised when hitting API rate limits |
NotFoundError |
Raised when resource is not found |
RequestError |
Raised for general request errors |
AuthenticationError |
Raised for authentication failures |
ValidationError |
Raised for invalid parameters |
from webscout.Extra.GitToolkit.gitapi import Repository
from webscout.Provider import ChatGPT
# Get repository data
repo = Repository("OE-LUCIFER", "Webscout")
repo_info = repo.get_info()
# Use with AI provider
chatgpt = ChatGPT()
response = chatgpt.ask(f"Analyze this GitHub repository: {repo_info}")
print(response)from fastapi import FastAPI, HTTPException
from webscout.Extra.GitToolkit.gitapi import Repository, NotFoundError
app = FastAPI()
@app.get("/repo/{owner}/{repo}")
async def get_repo_info(owner: str, repo: str):
"""Get GitHub repository information"""
try:
repository = Repository(owner, repo)
info = repository.get_info()
return {
"name": info["name"],
"full_name": info["full_name"],
"description": info["description"],
"stars": info["stargazers_count"],
"forks": info["forks_count"],
"open_issues": info["open_issues_count"]
}
except NotFoundError:
raise HTTPException(status_code=404, detail="Repository not found")
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))import asyncio
from webscout.Extra.GitToolkit.gitapi import Repository
async def get_multiple_repos():
"""Get information from multiple repositories asynchronously"""
repos = [
("OE-LUCIFER", "Webscout"),
("python", "cpython"),
("django", "django")
]
results = []
for owner, repo_name in repos:
try:
repo = Repository(owner, repo_name)
info = repo.get_info()
results.append({
"repo": f"{owner}/{repo_name}",
"stars": info["stargazers_count"],
"forks": info["forks_count"]
})
except Exception as e:
results.append({
"repo": f"{owner}/{repo_name}",
"error": str(e)
})
return results
# Run the async function
repo_data = asyncio.run(get_multiple_repos())
for data in repo_data:
print(f"{data['repo']}: {data.get('stars', 'Error')}")import pandas as pd
from webscout.Extra.GitToolkit.gitapi import Repository, GitSearch
# Get repository statistics
repo = Repository("OE-LUCIFER", "Webscout")
contributors = repo.get_contributors()
# Convert to DataFrame
contributors_df = pd.DataFrame([
{
"login": c["login"],
"contributions": c["contributions"],
"avatar_url": c["avatar_url"]
}
for c in contributors
])
print(f"Top contributors:")
print(contributors_df.sort_values("contributions", ascending=False).head(5))
# Search and analyze repositories
search = GitSearch()
results = search.search_repositories("webscout language:python")
repos_df = pd.DataFrame([
{
"name": r["name"],
"stars": r["stargazers_count"],
"forks": r["forks_count"],
"updated": r["updated_at"]
}
for r in results["items"]
])
print(f"\nWebscout-related repositories:")
print(repos_df.sort_values("stars", ascending=False).head(10))import click
from webscout.Extra.GitToolkit.gitapi import Repository, User, GitSearch
@click.group()
def cli():
"""GitHub Data CLI"""
pass
@cli.command()
@click.argument('owner')
@click.argument('repo')
def repo_info(owner, repo):
"""Get repository information"""
try:
repository = Repository(owner, repo)
info = repository.get_info()
click.echo(f"Repository: {info['full_name']}")
click.echo(f"Description: {info['description']}")
click.echo(f"Stars: {info['stargazers_count']}")
click.echo(f"Forks: {info['forks_count']}")
click.echo(f"Open Issues: {info['open_issues_count']}")
click.echo(f"License: {info['license']['name'] if info['license'] else 'None'}")
except Exception as e:
click.echo(f"Error: {e}", err=True)
@cli.command()
@click.argument('username')
def user_info(username):
"""Get user information"""
try:
user = User(username)
profile = user.get_profile()
click.echo(f"User: {profile['login']}")
click.echo(f"Name: {profile['name']}")
click.echo(f"Followers: {profile['followers']}")
click.echo(f"Following: {profile['following']}")
click.echo(f"Public Repos: {profile['public_repos']}")
click.echo(f"Bio: {profile['bio']}")
except Exception as e:
click.echo(f"Error: {e}", err=True)
@cli.command()
@click.argument('query')
def search_repos(query):
"""Search repositories"""
try:
search = GitSearch()
results = search.search_repositories(query)
click.echo(f"Found {results['total_count']} repositories:")
for repo in results['items'][:10]:
click.echo(f" {repo['full_name']} - ⭐ {repo['stargazers_count']}")
except Exception as e:
click.echo(f"Error: {e}", err=True)
if __name__ == '__main__':
cli()Tip
- Rate Limits: GitHub API has rate limits for unauthenticated requests (60 requests per hour per IP)
- Authentication: For higher limits, consider adding authentication with a GitHub token
- Error Handling: Always wrap GitHub API calls in try-catch blocks to handle rate limits and network issues
- Pagination: Use
per_pageparameter to control the number of results returned - Caching: Consider caching results for frequently accessed data to reduce API calls
- Batch Operations: For multiple requests, consider using async patterns or batching
# For production use with authentication
from webscout.Extra.GitToolkit.gitapi import Repository
# Initialize with authentication (if needed)
repo = Repository("owner", "repo", token="your_github_token")
# Use with retry logic for resilience
import time
from webscout.Extra.GitToolkit.gitapi import RateLimitError
max_retries = 3
retry_delay = 60 # seconds
for attempt in range(max_retries):
try:
info = repo.get_info()
break
except RateLimitError:
if attempt < max_retries - 1:
print(f"Rate limited, retrying in {retry_delay} seconds...")
time.sleep(retry_delay)
else:
raisefrom webscout.Extra.GitToolkit.gitapi import Repository
repo = Repository("OE-LUCIFER", "Webscout")
# Get commits with custom parameters
commits = repo.get_commits(
sha="main",
per_page=50,
since="2023-01-01",
until="2023-12-31"
)
# Get issues with filters
issues = repo.get_issues(
state="open",
labels="bug,enhancement",
sort="created",
direction="desc"
)from webscout.Extra.GitToolkit.gitapi import Repository
import json
repo = Repository("OE-LUCIFER", "Webscout")
# Get repository data and transform it
def transform_repo_data(repo_data):
"""Transform repository data into a custom format"""
return {
"name": repo_data["name"],
"full_name": repo_data["full_name"],
"description": repo_data["description"],
"metrics": {
"stars": repo_data["stargazers_count"],
"forks": repo_data["forks_count"],
"watchers": repo_data["watchers_count"],
"issues": repo_data["open_issues_count"]
},
"dates": {
"created": repo_data["created_at"],
"updated": repo_data["updated_at"],
"pushed": repo_data["pushed_at"]
},
"owner": {
"login": repo_data["owner"]["login"],
"type": repo_data["owner"]["type"]
}
}
info = repo.get_info()
transformed = transform_repo_data(info)
print(json.dumps(transformed, indent=2))from webscout.Extra.GitToolkit.gitapi import Repository
from webscout.sanitize import sanitize_stream
# Get repository README and process it
repo = Repository("OE-LUCIFER", "Webscout")
readme = repo.get_readme()
# Clean and process the README content
clean_content = list(sanitize_stream(
readme["content"],
intro_value="",
to_json=False,
strip_chars="\n\r\t"
))
print("Cleaned README content:")
for line in clean_content:
print(line)This documentation covers the comprehensive functionality of the webscout.Extra.GitToolkit.gitapi module. For the most up-to-date information, refer to the source code and inline documentation.