🚀 GitAPI: GitHub Data Extraction Module

Last updated: 2025-12-20 Maintained by Webscout

Webscout's GitAPI module provides a powerful, lightweight GitHub data extraction toolkit within the Webscout Python package. It offers comprehensive tools for retrieving GitHub repository, user, organization, gist, and trending data without requiring authentication for public data access.

Core Components
Features
Installation
Quick Examples
Available Classes
Error Handling
Integration Guide

Core Components

`gitapi.py`

The main GitHub API module that provides comprehensive GitHub data extraction capabilities.

from webscout.Extra.GitToolkit.gitapi import Repository, User, GitSearch, Gist, Organization, Trending

# Initialize and use GitHub data extraction
repo = Repository("OE-LUCIFER", "Webscout")
info = repo.get_info()
print(f"Stars: {info['stargazers_count']}")

Key Features:

Repository operations (metadata, commits, issues, releases)
User operations (profile, repositories, followers)
Search operations (repositories, users, topics)
Gist operations (public gists, specific gist retrieval)
Organization operations (details, repositories)
Trending data (repositories, developers)
Comprehensive error handling

Features

Repository Operations

Repository metadata, README, license, topics
Commit history and comparison
Pull requests and issues
Releases, branches, and tags
Contributors, stargazers, watchers, forks
GitHub Actions workflows
Repository events and statistics

User Operations

Profile information and social accounts
Repository listing and starred repos
Follower/Following data
Organizations and packages
Events, gists, and SSH/GPG keys

Search Operations

Search repositories by query, language, stars
Search users by name, location, followers
Search topics, commits, issues, and labels

Gist Operations

Get gist by ID with full content
List public gists
Gist commits, forks, and revisions
Gist comments

Organization Operations

Organization details and metadata
Organization repositories
Public members and events

Error Handling

Rate limit detection
Resource not found handling
Request retry mechanism
Custom error types

Installation

Install as part of the Webscout package:

pip install webscout

Or install with UV package manager:

uv add webscout

Quick Examples

Repository Operations

from webscout.Extra.GitToolkit.gitapi import Repository

repo = Repository("OE-LUCIFER", "Webscout")

# Get basic info
info = repo.get_info()
print(f"Stars: {info['stargazers_count']}")

# Get README
readme = repo.get_readme()
print(f"README: {readme['name']}")

# Get topics
topics = repo.get_topics()
print(f"Topics: {topics['names']}")

# Compare branches
diff = repo.compare("main", "dev")
print(f"Commits ahead: {diff['ahead_by']}")

User Operations

from webscout.Extra.GitToolkit.gitapi import User

user = User("OE-LUCIFER")

# Get profile
profile = user.get_profile()
print(f"Followers: {profile['followers']}")

# Get social accounts
socials = user.get_social_accounts()
for account in socials:
    print(f"{account['provider']}: {account['url']}")

Search Operations

from webscout.Extra.GitToolkit.gitapi import GitSearch

search = GitSearch()

# Search repositories
repos = search.search_repositories("webscout language:python stars:>100")
print(f"Found {repos['total_count']} repos")

# Search users
users = search.search_users("location:india followers:>1000")
print(f"Found {users['total_count']} users")

# Search topics
topics = search.search_topics("machine-learning")
for topic in topics['items'][:5]:
    print(f"Topic: {topic['name']}")

Gist Operations

from webscout.Extra.GitToolkit.gitapi import Gist

gist = Gist()

# List public gists
public = gist.list_public(per_page=5)
for g in public:
    print(f"Gist: {g['id']} - {g['description']}")

# Get specific gist
data = gist.get("gist_id_here")
print(f"Files: {list(data['files'].keys())}")

Organization Operations

from webscout.Extra.GitToolkit.gitapi import Organization

org = Organization("microsoft")

# Get org info
info = org.get_info()
print(f"Organization: {info['name']}")
print(f"Public repos: {info['public_repos']}")

# Get org repos
repos = org.get_repos(per_page=10)
for repo in repos:
    print(f"Repo: {repo['name']}")

from webscout.Extra.GitToolkit.gitapi import Trending

trending = Trending()

# Get trending repos
repos = trending.get_repositories(language="python", since="weekly")
for repo in repos[:5]:
    print(f"{repo['full_name']} - ⭐ {repo['stars']}")

# Get trending developers
devs = trending.get_developers(language="python")
for dev in devs[:5]:
    print(f"{dev['username']} - {dev.get('name', 'N/A')}")

Available Classes

Repository Class

The Repository class provides comprehensive access to GitHub repository data.

Methods:

get_info() - Get basic repository information
get_readme() - Get repository README content
get_license() - Get repository license information
get_topics() - Get repository topics
get_commits(sha="main", per_page=30) - Get commit history
get_commit(sha) - Get specific commit details
compare(base, head) - Compare branches
get_pull_requests(state="open", per_page=30) - Get pull requests
get_issues(state="open", per_page=30) - Get issues
get_labels(per_page=30) - Get labels
get_milestones(state="open", per_page=30) - Get milestones
get_releases(per_page=30) - Get releases
get_branches(per_page=30) - Get branches
get_tags(per_page=30) - Get tags
get_contributors(per_page=30) - Get contributors
get_stargazers(per_page=30) - Get stargazers
get_watchers(per_page=30) - Get watchers
get_forks(per_page=30) - Get forks
get_contents(path="", ref="main") - Get repository contents
get_languages() - Get repository languages
get_events(per_page=30) - Get repository events
get_workflows(per_page=30) - Get GitHub Actions workflows
get_workflow_runs(workflow_id, per_page=30) - Get workflow runs
get_deployments(per_page=30) - Get deployments
get_community_profile() - Get community profile
get_code_frequency() - Get code frequency statistics
get_commit_activity() - Get commit activity statistics

User Class

The User class provides access to GitHub user data.

Methods:

get_profile() - Get user profile information
get_repositories(type="owner", per_page=30) - Get user repositories
get_starred(per_page=30) - Get starred repositories
get_followers(per_page=30) - Get followers
get_following(per_page=30) - Get following
get_gists(per_page=30) - Get user gists
get_organizations(per_page=30) - Get organizations
get_public_events(per_page=30) - Get public events
get_received_events(per_page=30) - Get received events
get_keys(per_page=30) - Get SSH keys
get_gpg_keys(per_page=30) - Get GPG keys
get_social_accounts() - Get social accounts
get_packages(per_page=30) - Get packages

GitSearch Class

The GitSearch class provides GitHub search functionality.

Methods:

search_repositories(query, per_page=30) - Search repositories by query
search_users(query, per_page=30) - Search users
search_topics(query, per_page=30) - Search topics
search_commits(query, per_page=30) - Search commits
search_issues(query, per_page=30) - Search issues/PRs
search_labels(query, per_page=30) - Search labels

Gist Class

The Gist class provides access to GitHub gists.

Methods:

get(gist_id) - Get gist by ID
list_public(per_page=30) - List public gists
list_for_user(username, per_page=30) - List user's gists
get_commits(gist_id, per_page=30) - Get gist commits
get_forks(gist_id, per_page=30) - Get gist forks
get_revision(gist_id, sha) - Get gist revision
get_comments(gist_id, per_page=30) - Get gist comments

Organization Class

The Organization class provides access to GitHub organization data.

Methods:

get_info() - Get organization details
get_repos(type="all", per_page=30) - Get organization repositories
get_public_members(per_page=30) - Get public members
get_events(per_page=30) - Get organization events

Trending Class

The Trending class provides access to GitHub trending data.

Methods:

get_repositories(language=None, since="daily", spoken_language_code=None) - Get trending repositories
get_developers(language=None, since="daily") - Get trending developers

Error Handling

GitAPI includes comprehensive error handling for GitHub API interactions.

from webscout.Extra.GitToolkit.gitapi import Repository, NotFoundError, RateLimitError

try:
    repo = Repository("nonexistent", "repo")
    info = repo.get_info()
except NotFoundError:
    print("Repository not found")
except RateLimitError:
    print("Rate limit exceeded, try again later")
except Exception as e:
    print(f"An error occurred: {e}")

Exception Types

Exception	Description
`GitError`	Base exception for all GitHub API errors
`RateLimitError`	Raised when hitting API rate limits
`NotFoundError`	Raised when resource is not found
`RequestError`	Raised for general request errors
`AuthenticationError`	Raised for authentication failures
`ValidationError`	Raised for invalid parameters

Integration Guide

With Webscout Providers

from webscout.Extra.GitToolkit.gitapi import Repository
from webscout.Provider import ChatGPT

# Get repository data
repo = Repository("OE-LUCIFER", "Webscout")
repo_info = repo.get_info()

# Use with AI provider
chatgpt = ChatGPT()
response = chatgpt.ask(f"Analyze this GitHub repository: {repo_info}")
print(response)

With FastAPI

from fastapi import FastAPI, HTTPException
from webscout.Extra.GitToolkit.gitapi import Repository, NotFoundError

app = FastAPI()

@app.get("/repo/{owner}/{repo}")
async def get_repo_info(owner: str, repo: str):
    """Get GitHub repository information"""
    try:
        repository = Repository(owner, repo)
        info = repository.get_info()
        return {
            "name": info["name"],
            "full_name": info["full_name"],
            "description": info["description"],
            "stars": info["stargazers_count"],
            "forks": info["forks_count"],
            "open_issues": info["open_issues_count"]
        }
    except NotFoundError:
        raise HTTPException(status_code=404, detail="Repository not found")
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

With Async Applications

import asyncio
from webscout.Extra.GitToolkit.gitapi import Repository

async def get_multiple_repos():
    """Get information from multiple repositories asynchronously"""
    repos = [
        ("OE-LUCIFER", "Webscout"),
        ("python", "cpython"),
        ("django", "django")
    ]
    
    results = []
    for owner, repo_name in repos:
        try:
            repo = Repository(owner, repo_name)
            info = repo.get_info()
            results.append({
                "repo": f"{owner}/{repo_name}",
                "stars": info["stargazers_count"],
                "forks": info["forks_count"]
            })
        except Exception as e:
            results.append({
                "repo": f"{owner}/{repo_name}",
                "error": str(e)
            })
    
    return results

# Run the async function
repo_data = asyncio.run(get_multiple_repos())
for data in repo_data:
    print(f"{data['repo']}: {data.get('stars', 'Error')}")

With Data Analysis

import pandas as pd
from webscout.Extra.GitToolkit.gitapi import Repository, GitSearch

# Get repository statistics
repo = Repository("OE-LUCIFER", "Webscout")
contributors = repo.get_contributors()

# Convert to DataFrame
contributors_df = pd.DataFrame([
    {
        "login": c["login"],
        "contributions": c["contributions"],
        "avatar_url": c["avatar_url"]
    }
    for c in contributors
])

print(f"Top contributors:")
print(contributors_df.sort_values("contributions", ascending=False).head(5))

# Search and analyze repositories
search = GitSearch()
results = search.search_repositories("webscout language:python")

repos_df = pd.DataFrame([
    {
        "name": r["name"],
        "stars": r["stargazers_count"],
        "forks": r["forks_count"],
        "updated": r["updated_at"]
    }
    for r in results["items"]
])

print(f"\nWebscout-related repositories:")
print(repos_df.sort_values("stars", ascending=False).head(10))

With CLI Applications

import click
from webscout.Extra.GitToolkit.gitapi import Repository, User, GitSearch

@click.group()
def cli():
    """GitHub Data CLI"""
    pass

@cli.command()
@click.argument('owner')
@click.argument('repo')
def repo_info(owner, repo):
    """Get repository information"""
    try:
        repository = Repository(owner, repo)
        info = repository.get_info()
        
        click.echo(f"Repository: {info['full_name']}")
        click.echo(f"Description: {info['description']}")
        click.echo(f"Stars: {info['stargazers_count']}")
        click.echo(f"Forks: {info['forks_count']}")
        click.echo(f"Open Issues: {info['open_issues_count']}")
        click.echo(f"License: {info['license']['name'] if info['license'] else 'None'}")
        
    except Exception as e:
        click.echo(f"Error: {e}", err=True)

@cli.command()
@click.argument('username')
def user_info(username):
    """Get user information"""
    try:
        user = User(username)
        profile = user.get_profile()
        
        click.echo(f"User: {profile['login']}")
        click.echo(f"Name: {profile['name']}")
        click.echo(f"Followers: {profile['followers']}")
        click.echo(f"Following: {profile['following']}")
        click.echo(f"Public Repos: {profile['public_repos']}")
        click.echo(f"Bio: {profile['bio']}")
        
    except Exception as e:
        click.echo(f"Error: {e}", err=True)

@cli.command()
@click.argument('query')
def search_repos(query):
    """Search repositories"""
    try:
        search = GitSearch()
        results = search.search_repositories(query)
        
        click.echo(f"Found {results['total_count']} repositories:")
        for repo in results['items'][:10]:
            click.echo(f"  {repo['full_name']} - ⭐ {repo['stargazers_count']}")
            
    except Exception as e:
        click.echo(f"Error: {e}", err=True)

if __name__ == '__main__':
    cli()

Usage Tips

Tip

Rate Limits: GitHub API has rate limits for unauthenticated requests (60 requests per hour per IP)
Authentication: For higher limits, consider adding authentication with a GitHub token
Error Handling: Always wrap GitHub API calls in try-catch blocks to handle rate limits and network issues
Pagination: Use per_page parameter to control the number of results returned
Caching: Consider caching results for frequently accessed data to reduce API calls
Batch Operations: For multiple requests, consider using async patterns or batching

Configuration Best Practices

# For production use with authentication
from webscout.Extra.GitToolkit.gitapi import Repository

# Initialize with authentication (if needed)
repo = Repository("owner", "repo", token="your_github_token")

# Use with retry logic for resilience
import time
from webscout.Extra.GitToolkit.gitapi import RateLimitError

max_retries = 3
retry_delay = 60  # seconds

for attempt in range(max_retries):
    try:
        info = repo.get_info()
        break
    except RateLimitError:
        if attempt < max_retries - 1:
            print(f"Rate limited, retrying in {retry_delay} seconds...")
            time.sleep(retry_delay)
        else:
            raise

Advanced Features

Custom Query Parameters

from webscout.Extra.GitToolkit.gitapi import Repository

repo = Repository("OE-LUCIFER", "Webscout")

# Get commits with custom parameters
commits = repo.get_commits(
    sha="main",
    per_page=50,
    since="2023-01-01",
    until="2023-12-31"
)

# Get issues with filters
issues = repo.get_issues(
    state="open",
    labels="bug,enhancement",
    sort="created",
    direction="desc"
)

Data Transformation

from webscout.Extra.GitToolkit.gitapi import Repository
import json

repo = Repository("OE-LUCIFER", "Webscout")

# Get repository data and transform it
def transform_repo_data(repo_data):
    """Transform repository data into a custom format"""
    return {
        "name": repo_data["name"],
        "full_name": repo_data["full_name"],
        "description": repo_data["description"],
        "metrics": {
            "stars": repo_data["stargazers_count"],
            "forks": repo_data["forks_count"],
            "watchers": repo_data["watchers_count"],
            "issues": repo_data["open_issues_count"]
        },
        "dates": {
            "created": repo_data["created_at"],
            "updated": repo_data["updated_at"],
            "pushed": repo_data["pushed_at"]
        },
        "owner": {
            "login": repo_data["owner"]["login"],
            "type": repo_data["owner"]["type"]
        }
    }

info = repo.get_info()
transformed = transform_repo_data(info)
print(json.dumps(transformed, indent=2))

Integration with Other Tools

from webscout.Extra.GitToolkit.gitapi import Repository
from webscout.sanitize import sanitize_stream

# Get repository README and process it
repo = Repository("OE-LUCIFER", "Webscout")
readme = repo.get_readme()

# Clean and process the README content
clean_content = list(sanitize_stream(
    readme["content"],
    intro_value="",
    to_json=False,
    strip_chars="\n\r\t"
))

print("Cleaned README content:")
for line in clean_content:
    print(line)

This documentation covers the comprehensive functionality of the webscout.Extra.GitToolkit.gitapi module. For the most up-to-date information, refer to the source code and inline documentation.

Uh oh!

FilesExpand file tree

gitapi.md

Latest commit

History

gitapi.md

File metadata and controls

🚀 GitAPI: GitHub Data Extraction Module

Table of Contents

Core Components

gitapi.py

Features

Repository Operations

User Operations

Search Operations

Gist Operations

Organization Operations

Trending

Error Handling

Installation

Quick Examples

Repository Operations

User Operations

Search Operations

Gist Operations

Organization Operations

Trending

Available Classes

Repository Class

User Class

GitSearch Class

Gist Class

Organization Class

Trending Class

Error Handling

Exception Types

Integration Guide

With Webscout Providers

With FastAPI

With Async Applications

With Data Analysis

With CLI Applications

Usage Tips

Configuration Best Practices

Advanced Features

Custom Query Parameters

Data Transformation

Integration with Other Tools

`gitapi.py`