From 1791608d0a565561c69fb2e891bb051af362ada3 Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Wed, 21 Jan 2026 16:57:08 -0500 Subject: [PATCH 01/11] feat: Added comprehensive unit testing and github action to run tests on new pull requests --- .github/workflows/tests.yml | 11 + TESTING.md | 621 +++++++++++ pytest.ini | 49 + test_main.py | 2106 +++++++++++++++++++++++++++++++++++ 4 files changed, 2787 insertions(+) create mode 100644 TESTING.md create mode 100644 pytest.ini create mode 100644 test_main.py diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml index 8118a66..c7b9d39 100644 --- a/.github/workflows/tests.yml +++ b/.github/workflows/tests.yml @@ -19,3 +19,14 @@ jobs: pip install -e ".[dev]" - name: Run all tests run: pytest + + integration-test: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Run integration test with docker-compose + run: | + docker-compose up --build --abort-on-container-exit --exit-code-from github-etl + - name: Cleanup + if: always() + run: docker-compose down -v diff --git a/TESTING.md b/TESTING.md new file mode 100644 index 0000000..c0bb5dd --- /dev/null +++ b/TESTING.md @@ -0,0 +1,621 @@ +# Testing Guide for GitHub ETL + +This document describes comprehensive testing for the GitHub ETL pipeline, including +unit tests, integration tests, Docker testing, linting, and CI/CD workflows. + +## Table of Contents + +1. [Unit Testing](#unit-testing) +2. [Test Organization](#test-organization) +3. [Running Tests](#running-tests) +4. [Code Coverage](#code-coverage) +5. [Linting and Code Quality](#linting-and-code-quality) +6. [CI/CD Integration](#cicd-integration) +7. [Docker Testing](#docker-testing) +8. [Adding New Tests](#adding-new-tests) + +--- + +## Unit Testing + +The test suite in `test_main.py` provides comprehensive coverage for all functions in `main.py`. +We have **95 unit tests** covering 9 functions with 80%+ code coverage requirement. + +### Test Structure + +Tests are organized into 10 test classes: + +1. **TestSetupLogging** (1 test) - Logging configuration +2. **TestSleepForRateLimit** (4 tests) - Rate limit handling +3. **TestExtractPullRequests** (14 tests) - PR extraction with pagination and enrichment +4. **TestExtractCommits** (9 tests) - Commit and file extraction +5. **TestExtractReviewers** (6 tests) - Reviewer extraction +6. **TestExtractComments** (7 tests) - Comment extraction (uses /issues endpoint) +7. **TestTransformData** (26 tests) - Data transformation for all 4 BigQuery tables +8. **TestLoadData** (8 tests) - BigQuery data loading +9. **TestMain** (17 tests) - Main ETL orchestration +10. **TestIntegration** (3 tests) - End-to-end integration tests (marked with `@pytest.mark.integration`) + +### Fixtures + +Reusable fixtures are defined at the top of `test_main.py`: + +- `mock_session` - Mocked `requests.Session` +- `mock_bigquery_client` - Mocked BigQuery client +- `mock_pr_response` - Realistic pull request response +- `mock_commit_response` - Realistic commit with files +- `mock_reviewer_response` - Realistic reviewer response +- `mock_comment_response` - Realistic comment response + +## Test Organization + +### Function Coverage + +| Function | Tests | Coverage Target | Key Test Areas | +|----------|-------|-----------------|----------------| +| `setup_logging()` | 1 | 100% | Logger configuration | +| `sleep_for_rate_limit()` | 4 | 100% | Rate limit sleep logic, edge cases | +| `extract_pull_requests()` | 14 | 90%+ | Pagination, rate limits, enrichment, error handling | +| `extract_commits()` | 9 | 85%+ | Commit/file fetching, rate limits, errors | +| `extract_reviewers()` | 6 | 85%+ | Reviewer states, rate limits, errors | +| `extract_comments()` | 7 | 85%+ | Comment fetching (via /issues), rate limits | +| `transform_data()` | 26 | 95%+ | Bug ID extraction, 4 tables, field mapping | +| `load_data()` | 8 | 90%+ | BigQuery insertion, snapshot dates, errors | +| `main()` | 17 | 85%+ | Env vars, orchestration, chunking | + +**Overall Target: 85-90% coverage** (80% minimum enforced in CI) + +### Critical Test Cases + +#### Bug ID Extraction +Tests verify the regex pattern matches: +- `Bug 1234567 - Fix` → 1234567 +- `bug 1234567` → 1234567 +- `b=1234567` → 1234567 +- `Bug #1234567` → 1234567 +- Filters out IDs >= 100000000 + +#### Data Transformation +Tests ensure correct transformation for all 4 BigQuery tables: +- **pull_requests**: PR metadata, bug IDs, labels, date_approved +- **commits**: Flattened files (one row per file), commit metadata +- **reviewers**: Review states, date_approved calculation +- **comments**: Character count, status mapping from reviews + +#### Rate Limiting +Tests verify rate limit handling at all API levels: +- Pull requests pagination +- Commit fetching +- Reviewer fetching +- Comment fetching + +## Running Tests + +### All Tests with Coverage + +```bash +pytest +``` + +This runs all tests with coverage reporting (configured in `pytest.ini`). + +### Fast Unit Tests Only (Skip Integration) + +```bash +pytest -m "not integration and not slow" +``` + +Use this for fast feedback during development. + +### Specific Test Class + +```bash +pytest test_main.py::TestTransformData +``` + +### Specific Test Function + +```bash +pytest test_main.py::TestTransformData::test_bug_id_extraction_basic -v +``` + +### With Verbose Output + +```bash +pytest -v +``` + +### With Coverage Report + +```bash +# Terminal report +pytest --cov=main --cov-report=term-missing + +# HTML report +pytest --cov=main --cov-report=html +open htmlcov/index.html +``` + +### Integration Tests Only + +```bash +pytest -m integration +``` + +## Code Coverage + +### Coverage Requirements + +- **Minimum**: 80% (enforced in CI via `--cov-fail-under=80`) +- **Target**: 85-90% +- **Current**: Run `pytest --cov=main` to see current coverage + +### Coverage Configuration + +Coverage settings are in `pytest.ini`: + +```ini +[pytest] +addopts = + --cov=main + --cov-report=term-missing + --cov-report=html + --cov-branch + --cov-fail-under=80 +``` + +### Viewing Coverage + +```bash +# Generate HTML coverage report +pytest --cov=main --cov-report=html + +# Open in browser +xdg-open htmlcov/index.html # Linux +open htmlcov/index.html # macOS +``` + +The HTML report shows: +- Line-by-line coverage +- Branch coverage +- Missing lines highlighted +- Per-file coverage percentages + +## Linting and Code Quality + +### Available Linters + +The project uses these linting tools (defined in `requirements.txt`): + +- **black** - Code formatting +- **isort** - Import sorting +- **flake8** - Style and syntax checking +- **mypy** - Static type checking + +### Running Linters + +```bash +# Run black (auto-format) +black main.py test_main.py + +# Check formatting without changes +black --check main.py test_main.py + +# Sort imports +isort main.py test_main.py + +# Check import sorting +isort --check-only main.py test_main.py + +# Run flake8 +flake8 main.py test_main.py --max-line-length=100 --extend-ignore=E203,W503 + +# Run mypy +mypy main.py --no-strict-optional --ignore-missing-imports +``` + +### All Linting Checks + +```bash +# Run all linters in sequence +black --check main.py test_main.py && \ +isort --check-only main.py test_main.py && \ +flake8 main.py test_main.py --max-line-length=100 --extend-ignore=E203,W503 && \ +mypy main.py --no-strict-optional --ignore-missing-imports +``` + +## CI/CD Integration + +### GitHub Actions Workflow + +The `.github/workflows/tests.yml` workflow runs on every push and pull request: + +**Lint Job:** +1. Runs black (format check) +2. Runs isort (import check) +3. Runs flake8 (style check) +4. Runs mypy (type check) + +**Test Job:** +1. Runs fast unit tests with 80% coverage threshold +2. Runs all tests (including integration) +3. Uploads coverage reports as artifacts + +### Workflow Triggers + +- Push to `main` or `unit-tests` branch +- Pull requests to `main` branch + +### Viewing Results + +- Check the Actions tab in GitHub +- Coverage artifacts are uploaded for each run +- Failed linting or tests will block merges + +## Docker Testing + +## Overview + +The `docker-compose.yml` configuration provides a complete local testing environment with: + +1. **Mock GitHub API** - A Flask-based mock service that simulates the GitHub Pull Requests API +2. **BigQuery Emulator** - A local BigQuery instance for testing data loads +3. **ETL Service** - The main GitHub ETL application configured to use the mock services + +## Quick Start + +### Start all services + +```bash +docker-compose up --build +``` + +This will: + +- Build and start the mock GitHub API (port 5000) +- Start the BigQuery emulator (ports 9050, 9060) +- Build and run the ETL service + +The ETL service will automatically: + +- Fetch 250 mock pull requests from the mock GitHub API +- Transform the data +- Load it into the BigQuery emulator + +### View logs + +```bash +# All services +docker-compose logs -f + +# Specific service +docker-compose logs -f github-etl +docker-compose logs -f bigquery-emulator +docker-compose logs -f mock-github-api +``` + +### Stop services + +```bash +docker-compose down +``` + +## Architecture + +### Mock GitHub API Service + +- **Port**: 5000 +- **Endpoint**: `http://localhost:5000/repos/{owner}/{repo}/pulls` +- **Mock data**: Generates 250 sample pull requests with realistic data +- **Features**: + - Pagination support (per_page, page parameters) + - Realistic PR data (numbers, titles, states, timestamps, users, etc.) + - Mock rate limit headers + - No authentication required + +### BigQuery Emulator Service + +- **Ports**: + - 9050 (BigQuery API) + - 9060 (Discovery/Admin API) +- **Configuration**: Uses `data.yml` to define the schema +- **Project**: test-project +- **Dataset**: test_dataset +- **Table**: pull_requests + +### ETL Service + +The ETL service is configured via environment variables in `docker-compose.yml`: + +```yaml +environment: + GITHUB_REPOS: "mozilla/firefox" + GITHUB_API_URL: "http://mock-github-api:5000" # Points to mock API + BIGQUERY_PROJECT: "test" + BIGQUERY_DATASET: "github_etl" + BIGQUERY_EMULATOR_HOST: "http://bigquery-emulator:9050" +``` + +## Customization + +### Using Real GitHub API + +To test with the real GitHub API instead of the mock: + +1. Set `GITHUB_TOKEN` environment variable +2. Remove or comment out `GITHUB_API_URL` in docker-compose.yml +3. Update `depends_on` to not require mock-github-api + +```bash +export GITHUB_TOKEN="your_github_token" +docker-compose up github-etl bigquery-emulator +``` + +### Adjusting Mock Data + +Edit `mock_github_api.py` to customize: + +- Total number of PRs (default: 250) +- PR field values +- Pagination behavior + +### Modifying BigQuery Schema + +Edit `data.yml` to change the table schema. The schema matches the fields +extracted in `main.py`'s `transform_data()` function. + +## Querying the BigQuery Emulator + +You can query the BigQuery emulator using the BigQuery Python client: + +```python +from google.cloud import bigquery +from google.api_core.client_options import ClientOptions + +client = bigquery.Client( + project="test-project", + client_options=ClientOptions(api_endpoint="http://localhost:9050") +) + +query = """ +SELECT pr_number, title, state, user_login +FROM `test-project.test_dataset.pull_requests` +LIMIT 10 +""" + +for row in client.query(query): + print(f"PR #{row.pr_number}: {row.title} - {row.state}") +``` + +Or use the `bq` command-line tool with the emulator endpoint. + +## Troubleshooting + +### Services not starting + +Check if ports are already in use: + +```bash +lsof -i :5000 # Mock GitHub API +lsof -i :9050 # BigQuery emulator +``` + +### ETL fails to connect + +Ensure services are healthy: + +```bash +docker-compose ps +``` + +Check service logs: + +```bash +docker-compose logs bigquery-emulator +docker-compose logs mock-github-api +``` + +### Schema mismatch errors + +Verify `data.yml` schema matches fields in `main.py:transform_data()`. + +## Development Workflow + +1. Make changes to `main.py` +2. Restart the ETL service: `docker-compose restart github-etl` +3. View logs: `docker-compose logs -f github-etl` + +The `main.py` file is mounted as a volume, so changes are reflected without rebuilding. + +## Cleanup + +Remove all containers and volumes: + +```bash +docker-compose down -v +``` + +Remove built images: + +```bash +docker-compose down --rmi all +``` + +--- + +## Adding New Tests + +### Testing Patterns + +#### 1. Mock External Dependencies + +Always mock external API calls and BigQuery operations: + +```python +@patch("requests.Session") +def test_api_call(mock_session_class): + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"id": 1}] + + mock_session.get.return_value = mock_response + # Test code here +``` + +#### 2. Use Fixtures + +Leverage existing fixtures for common test data: + +```python +def test_with_fixtures(mock_session, mock_pr_response): + # Use mock_session and mock_pr_response + pass +``` + +#### 3. Test Edge Cases + +Always test: +- Empty inputs +- None values +- Missing fields +- Rate limits +- API errors (404, 500, etc.) +- Boundary conditions + +#### 4. Verify Call Arguments + +Check that functions are called with correct parameters: + +```python +mock_extract.assert_called_once_with( + session=mock_session, + repo="mozilla/firefox", + github_api_url="https://api.github.com" +) +``` + +### Example: Adding a New Test + +```python +class TestNewFunction: + """Tests for new_function.""" + + def test_basic_functionality(self, mock_session): + """Test basic happy path.""" + # Arrange + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = {"result": "success"} + mock_session.get.return_value = mock_response + + # Act + result = main.new_function(mock_session, "arg1") + + # Assert + assert result == {"result": "success"} + mock_session.get.assert_called_once() + + def test_error_handling(self, mock_session): + """Test error handling.""" + mock_response = Mock() + mock_response.status_code = 500 + mock_response.text = "Internal Error" + mock_session.get.return_value = mock_response + + with pytest.raises(SystemExit) as exc_info: + main.new_function(mock_session, "arg1") + + assert "500" in str(exc_info.value) +``` + +### Test Organization Guidelines + +1. **Group related tests** in test classes +2. **Use descriptive names** like `test_handles_rate_limit_on_commits` +3. **One assertion concept per test** - Test one thing at a time +4. **Arrange-Act-Assert pattern** - Structure tests clearly +5. **Add docstrings** to explain what each test verifies + +### Mocking Patterns + +#### Mocking Time + +```python +@patch("time.time") +@patch("time.sleep") +def test_with_time(mock_sleep, mock_time): + mock_time.return_value = 1000 + # Test code +``` + +#### Mocking Environment Variables + +```python +with patch.dict(os.environ, {"VAR_NAME": "value"}, clear=True): + # Test code +``` + +#### Mocking Generators + +```python +mock_extract.return_value = iter([[{"id": 1}], [{"id": 2}]]) +``` + +### Running Tests During Development + +```bash +# Auto-run tests on file changes (requires pytest-watch) +pip install pytest-watch +ptw -- --cov=main -m "not integration" +``` + +### Debugging Tests + +```bash +# Drop into debugger on failures +pytest --pdb + +# Show print statements +pytest -s + +# Verbose with full diff +pytest -vv +``` + +### Coverage Tips + +If coverage is below 80%: + +1. Run `pytest --cov=main --cov-report=term-missing` to see missing lines +2. Look for untested branches (if/else paths) +3. Check error handling paths +4. Verify edge cases are covered + +## Resources + +- [pytest documentation](https://docs.pytest.org/) +- [pytest-cov documentation](https://pytest-cov.readthedocs.io/) +- [unittest.mock documentation](https://docs.python.org/3/library/unittest.mock.html) + +## Troubleshooting + +### Tests Pass Locally But Fail in CI + +- Check Python version (must be 3.11) +- Verify all dependencies are in `requirements.txt` +- Look for environment-specific issues + +### Coverage Dropped Below 80% + +- Run locally: `pytest --cov=main --cov-report=html` +- Open `htmlcov/index.html` to see uncovered lines +- Add tests for missing coverage + +### Import Errors + +- Ensure `PYTHONPATH` includes project root +- Check that `__init__.py` files exist if needed +- Verify module names match file names diff --git a/pytest.ini b/pytest.ini new file mode 100644 index 0000000..d4a601a --- /dev/null +++ b/pytest.ini @@ -0,0 +1,49 @@ +[pytest] +# Pytest configuration for GitHub ETL project + +# Test discovery patterns +python_files = test_*.py +python_classes = Test* +python_functions = test_* + +# Output options +addopts = + -v + --strict-markers + --tb=short + --cov=main + --cov-report=term-missing + --cov-report=html + --cov-branch + +# Minimum coverage threshold (can adjust as needed) +--cov-fail-under=80 + +# Test paths +testpaths = . + +# Markers for organizing tests +markers = + unit: Unit tests for individual functions + integration: Integration tests that test multiple components + slow: Tests that take longer to run + +# Logging +log_cli = false +log_cli_level = INFO +log_cli_format = %(asctime)s [%(levelname)8s] %(message)s +log_cli_date_format = %Y-%m-%d %H:%M:%S + +# Coverage options +[coverage:run] +source = . +omit = + test_*.py + .venv/* + venv/* + */site-packages/* + +[coverage:report] +precision = 2 +show_missing = true +skip_covered = false diff --git a/test_main.py b/test_main.py new file mode 100644 index 0000000..7165677 --- /dev/null +++ b/test_main.py @@ -0,0 +1,2106 @@ +#!/usr/bin/env python3 +""" +Comprehensive test suite for GitHub ETL main.py + +This test suite provides complete coverage for all functions in main.py, +including extraction, transformation, loading, and orchestration logic. +""" + +import logging +import os +import sys +import time +from datetime import datetime, timezone +from unittest.mock import Mock, MagicMock, patch, call +import pytest +import requests +from google.cloud import bigquery + +import main + + +# ============================================================================= +# FIXTURES +# ============================================================================= + + +@pytest.fixture +def mock_session(): + """Provide a mocked requests.Session for testing.""" + session = Mock(spec=requests.Session) + session.headers = {} + return session + + +@pytest.fixture +def mock_bigquery_client(): + """Provide a mocked BigQuery client for testing.""" + client = Mock(spec=bigquery.Client) + client.project = "test-project" + client.insert_rows_json = Mock(return_value=[]) + return client + + +@pytest.fixture +def mock_pr_response(): + """Provide a realistic pull request response for testing.""" + return { + "number": 123, + "title": "Bug 1234567 - Fix login issue", + "state": "closed", + "created_at": "2024-01-01T10:00:00Z", + "updated_at": "2024-01-02T10:00:00Z", + "merged_at": "2024-01-02T10:00:00Z", + "user": {"login": "testuser"}, + "head": {"ref": "fix-branch"}, + "base": {"ref": "main"}, + "labels": [{"name": "bug"}, {"name": "priority-high"}], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + + +@pytest.fixture +def mock_commit_response(): + """Provide a realistic commit response with files.""" + return { + "sha": "abc123def456", + "commit": { + "author": { + "name": "Test Author", + "email": "test@example.com", + "date": "2024-01-01T12:00:00Z", + } + }, + "files": [ + { + "filename": "src/login.py", + "additions": 10, + "deletions": 5, + "changes": 15, + }, + { + "filename": "tests/test_login.py", + "additions": 20, + "deletions": 2, + "changes": 22, + }, + ], + } + + +@pytest.fixture +def mock_reviewer_response(): + """Provide a realistic reviewer response.""" + return { + "id": 789, + "user": {"login": "reviewer1"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T15:00:00Z", + "body": "LGTM", + } + + +@pytest.fixture +def mock_comment_response(): + """Provide a realistic comment response.""" + return { + "id": 456, + "user": {"login": "commenter1"}, + "created_at": "2024-01-01T14:00:00Z", + "body": "This looks good to me", + "pull_request_review_id": None, + } + + +# ============================================================================= +# TEST CLASSES +# ============================================================================= + + +class TestSetupLogging: + """Tests for setup_logging function.""" + + def test_setup_logging_configures_logger(self): + """Test that setup_logging configures the root logger correctly.""" + main.setup_logging() + + root_logger = logging.getLogger() + assert root_logger.level == logging.INFO + assert len(root_logger.handlers) > 0 + + # Check that at least one handler is a StreamHandler + has_stream_handler = any( + isinstance(handler, logging.StreamHandler) + for handler in root_logger.handlers + ) + assert has_stream_handler + + +class TestSleepForRateLimit: + """Tests for sleep_for_rate_limit function.""" + + @patch("time.time") + @patch("time.sleep") + def test_sleep_for_rate_limit_when_remaining_is_zero( + self, mock_sleep, mock_time + ): + """Test that sleep_for_rate_limit sleeps until reset time.""" + mock_time.return_value = 1000 + + mock_response = Mock() + mock_response.headers = { + "X-RateLimit-Remaining": "0", + "X-RateLimit-Reset": "1120", # 120 seconds from now + } + + main.sleep_for_rate_limit(mock_response) + + mock_sleep.assert_called_once_with(120) + + @patch("time.time") + @patch("time.sleep") + def test_sleep_for_rate_limit_when_reset_already_passed( + self, mock_sleep, mock_time + ): + """Test that sleep_for_rate_limit doesn't sleep negative time.""" + mock_time.return_value = 2000 + + mock_response = Mock() + mock_response.headers = { + "X-RateLimit-Remaining": "0", + "X-RateLimit-Reset": "1500", # Already passed + } + + main.sleep_for_rate_limit(mock_response) + + # Should sleep for 0 seconds (max of 0 and negative value) + mock_sleep.assert_called_once_with(0) + + @patch("time.sleep") + def test_sleep_for_rate_limit_when_remaining_not_zero(self, mock_sleep): + """Test that sleep_for_rate_limit doesn't sleep when remaining > 0.""" + mock_response = Mock() + mock_response.headers = { + "X-RateLimit-Remaining": "5", + "X-RateLimit-Reset": "1500", + } + + main.sleep_for_rate_limit(mock_response) + + # Should not sleep when remaining > 0 + mock_sleep.assert_not_called() + + @patch("time.sleep") + def test_sleep_for_rate_limit_with_missing_headers(self, mock_sleep): + """Test sleep_for_rate_limit with missing rate limit headers.""" + mock_response = Mock() + mock_response.headers = {} + + main.sleep_for_rate_limit(mock_response) + + # Should not sleep when headers are missing (defaults to remaining=1) + mock_sleep.assert_not_called() + + +class TestExtractPullRequests: + """Tests for extract_pull_requests function.""" + + def test_extract_single_page(self, mock_session): + """Test extracting data from a single page of results.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [ + {"number": 1, "title": "PR 1"}, + {"number": 2, "title": "PR 2"}, + ] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + # Mock the extract functions + with patch("main.extract_commits", return_value=[]), patch( + "main.extract_reviewers", return_value=[] + ), patch("main.extract_comments", return_value=[]): + result = list( + main.extract_pull_requests(mock_session, "mozilla/firefox") + ) + + assert len(result) == 1 + assert len(result[0]) == 2 + assert result[0][0]["number"] == 1 + assert result[0][1]["number"] == 2 + + def test_extract_multiple_pages(self, mock_session): + """Test extracting data across multiple pages with pagination.""" + # First page response + mock_response_1 = Mock() + mock_response_1.status_code = 200 + mock_response_1.json.return_value = [ + {"number": 1, "title": "PR 1"}, + {"number": 2, "title": "PR 2"}, + ] + mock_response_1.links = { + "next": { + "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2" + } + } + + # Second page response + mock_response_2 = Mock() + mock_response_2.status_code = 200 + mock_response_2.json.return_value = [{"number": 3, "title": "PR 3"}] + mock_response_2.links = {} + + mock_session.get.side_effect = [mock_response_1, mock_response_2] + + with patch("main.extract_commits", return_value=[]), patch( + "main.extract_reviewers", return_value=[] + ), patch("main.extract_comments", return_value=[]): + result = list( + main.extract_pull_requests(mock_session, "mozilla/firefox") + ) + + assert len(result) == 2 + assert len(result[0]) == 2 + assert len(result[1]) == 1 + assert result[0][0]["number"] == 1 + assert result[1][0]["number"] == 3 + + def test_enriches_prs_with_commit_data(self, mock_session): + """Test that PRs are enriched with commit data.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + mock_commits = [{"sha": "abc123"}] + + with patch( + "main.extract_commits", return_value=mock_commits + ) as mock_extract_commits, patch( + "main.extract_reviewers", return_value=[] + ), patch( + "main.extract_comments", return_value=[] + ): + result = list( + main.extract_pull_requests(mock_session, "mozilla/firefox") + ) + + assert result[0][0]["commit_data"] == mock_commits + mock_extract_commits.assert_called_once() + + def test_enriches_prs_with_reviewer_data(self, mock_session): + """Test that PRs are enriched with reviewer data.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + mock_reviewers = [{"id": 789, "state": "APPROVED"}] + + with patch("main.extract_commits", return_value=[]), patch( + "main.extract_reviewers", return_value=mock_reviewers + ) as mock_extract_reviewers, patch( + "main.extract_comments", return_value=[] + ): + result = list( + main.extract_pull_requests(mock_session, "mozilla/firefox") + ) + + assert result[0][0]["reviewer_data"] == mock_reviewers + mock_extract_reviewers.assert_called_once() + + def test_enriches_prs_with_comment_data(self, mock_session): + """Test that PRs are enriched with comment data.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + mock_comments = [{"id": 456, "body": "Great work!"}] + + with patch("main.extract_commits", return_value=[]), patch( + "main.extract_reviewers", return_value=[] + ), patch( + "main.extract_comments", return_value=mock_comments + ) as mock_extract_comments: + result = list( + main.extract_pull_requests(mock_session, "mozilla/firefox") + ) + + assert result[0][0]["comment_data"] == mock_comments + mock_extract_comments.assert_called_once() + + @patch("main.sleep_for_rate_limit") + def test_handles_rate_limit(self, mock_sleep, mock_session): + """Test that extract_pull_requests handles rate limiting correctly.""" + # Rate limit response + mock_response_rate_limit = Mock() + mock_response_rate_limit.status_code = 403 + mock_response_rate_limit.headers = {"X-RateLimit-Remaining": "0"} + + # Successful response after rate limit + mock_response_success = Mock() + mock_response_success.status_code = 200 + mock_response_success.json.return_value = [ + {"number": 1, "title": "PR 1"} + ] + mock_response_success.links = {} + + mock_session.get.side_effect = [ + mock_response_rate_limit, + mock_response_success, + ] + + with patch("main.extract_commits", return_value=[]), patch( + "main.extract_reviewers", return_value=[] + ), patch("main.extract_comments", return_value=[]): + result = list( + main.extract_pull_requests(mock_session, "mozilla/firefox") + ) + + mock_sleep.assert_called_once_with(mock_response_rate_limit) + assert len(result) == 1 + + def test_handles_api_error_404(self, mock_session): + """Test that extract_pull_requests raises SystemExit on 404.""" + mock_response = Mock() + mock_response.status_code = 404 + mock_response.text = "Not Found" + + mock_session.get.return_value = mock_response + + with pytest.raises(SystemExit) as exc_info: + list(main.extract_pull_requests(mock_session, "mozilla/nonexistent")) + + assert "GitHub API error 404" in str(exc_info.value) + + def test_handles_api_error_500(self, mock_session): + """Test that extract_pull_requests raises SystemExit on 500.""" + mock_response = Mock() + mock_response.status_code = 500 + mock_response.text = "Internal Server Error" + + mock_session.get.return_value = mock_response + + with pytest.raises(SystemExit) as exc_info: + list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + assert "GitHub API error 500" in str(exc_info.value) + + def test_stops_on_empty_batch(self, mock_session): + """Test that extraction stops when an empty batch is returned.""" + # First page with data + mock_response_1 = Mock() + mock_response_1.status_code = 200 + mock_response_1.json.return_value = [{"number": 1}] + mock_response_1.links = { + "next": { + "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2" + } + } + + # Second page empty + mock_response_2 = Mock() + mock_response_2.status_code = 200 + mock_response_2.json.return_value = [] + mock_response_2.links = {} + + mock_session.get.side_effect = [mock_response_1, mock_response_2] + + with patch("main.extract_commits", return_value=[]), patch( + "main.extract_reviewers", return_value=[] + ), patch("main.extract_comments", return_value=[]): + result = list( + main.extract_pull_requests(mock_session, "mozilla/firefox") + ) + + # Should only have 1 chunk from first page + assert len(result) == 1 + assert len(result[0]) == 1 + + def test_invalid_page_number_handling(self, mock_session): + """Test handling of invalid page number in pagination.""" + mock_response_1 = Mock() + mock_response_1.status_code = 200 + mock_response_1.json.return_value = [{"number": 1}] + mock_response_1.links = { + "next": { + "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=invalid" + } + } + + mock_session.get.return_value = mock_response_1 + + with patch("main.extract_commits", return_value=[]), patch( + "main.extract_reviewers", return_value=[] + ), patch("main.extract_comments", return_value=[]): + result = list( + main.extract_pull_requests(mock_session, "mozilla/firefox") + ) + + # Should stop pagination on invalid page number + assert len(result) == 1 + + def test_custom_github_api_url(self, mock_session): + """Test using custom GitHub API URL.""" + custom_url = "https://mock-github.example.com" + + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1}] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + with patch("main.extract_commits", return_value=[]), patch( + "main.extract_reviewers", return_value=[] + ), patch("main.extract_comments", return_value=[]): + list( + main.extract_pull_requests( + mock_session, "mozilla/firefox", github_api_url=custom_url + ) + ) + + # Verify custom URL was used + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] + + def test_skips_prs_without_number_field(self, mock_session): + """Test that PRs without 'number' field are skipped.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [ + {"number": 1, "title": "PR 1"}, + {"title": "PR without number"}, # Missing number field + {"number": 2, "title": "PR 2"}, + ] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + with patch("main.extract_commits", return_value=[]) as mock_commits, patch( + "main.extract_reviewers", return_value=[] + ), patch("main.extract_comments", return_value=[]): + result = list( + main.extract_pull_requests(mock_session, "mozilla/firefox") + ) + + # extract_commits should only be called for PRs with number field + assert mock_commits.call_count == 2 + + +class TestExtractCommits: + """Tests for extract_commits function.""" + + def test_fetch_commits_with_files(self, mock_session): + """Test fetching commits with files for a PR.""" + # Mock commits list response + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [ + {"sha": "abc123"}, + {"sha": "def456"}, + ] + + # Mock individual commit responses + commit_detail_1 = Mock() + commit_detail_1.status_code = 200 + commit_detail_1.json.return_value = { + "sha": "abc123", + "files": [{"filename": "file1.py", "additions": 10}], + } + + commit_detail_2 = Mock() + commit_detail_2.status_code = 200 + commit_detail_2.json.return_value = { + "sha": "def456", + "files": [{"filename": "file2.py", "deletions": 5}], + } + + mock_session.get.side_effect = [ + commits_response, + commit_detail_1, + commit_detail_2, + ] + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert len(result) == 2 + assert result[0]["sha"] == "abc123" + assert result[0]["files"][0]["filename"] == "file1.py" + assert result[1]["sha"] == "def456" + assert result[1]["files"][0]["filename"] == "file2.py" + + def test_multiple_files_per_commit(self, mock_session): + """Test handling multiple files in a single commit.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [{"sha": "abc123"}] + + commit_detail = Mock() + commit_detail.status_code = 200 + commit_detail.json.return_value = { + "sha": "abc123", + "files": [ + {"filename": "file1.py", "additions": 10}, + {"filename": "file2.py", "additions": 20}, + {"filename": "file3.py", "deletions": 5}, + ], + } + + mock_session.get.side_effect = [commits_response, commit_detail] + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert len(result) == 1 + assert len(result[0]["files"]) == 3 + + @patch("main.sleep_for_rate_limit") + def test_rate_limit_on_commits_list(self, mock_sleep, mock_session): + """Test rate limit handling when fetching commits list.""" + # Rate limit response + rate_limit_response = Mock() + rate_limit_response.status_code = 403 + rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} + + # Success response + success_response = Mock() + success_response.status_code = 200 + success_response.json.return_value = [] + + mock_session.get.side_effect = [rate_limit_response, success_response] + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + mock_sleep.assert_called_once() + assert result == [] + + def test_api_error_on_commits_list(self, mock_session): + """Test API error handling when fetching commits list.""" + error_response = Mock() + error_response.status_code = 500 + error_response.text = "Internal Server Error" + + mock_session.get.return_value = error_response + + with pytest.raises(SystemExit) as exc_info: + main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert "GitHub API error 500" in str(exc_info.value) + + def test_api_error_on_individual_commit(self, mock_session): + """Test API error when fetching individual commit details.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [{"sha": "abc123"}] + + commit_error = Mock() + commit_error.status_code = 404 + commit_error.text = "Commit not found" + + mock_session.get.side_effect = [commits_response, commit_error] + + with pytest.raises(SystemExit) as exc_info: + main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert "GitHub API error 404" in str(exc_info.value) + + def test_commit_without_sha_field(self, mock_session): + """Test handling commits without sha field.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [ + {"sha": "abc123"}, + {}, # Missing sha field + ] + + commit_detail_1 = Mock() + commit_detail_1.status_code = 200 + commit_detail_1.json.return_value = {"sha": "abc123", "files": []} + + commit_detail_2 = Mock() + commit_detail_2.status_code = 200 + commit_detail_2.json.return_value = {"files": []} + + mock_session.get.side_effect = [commits_response, commit_detail_1, commit_detail_2] + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + # Should handle the commit without sha gracefully + assert len(result) == 2 + + def test_custom_github_api_url(self, mock_session): + """Test using custom GitHub API URL for commits.""" + custom_url = "https://mock-github.example.com" + + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [] + + mock_session.get.return_value = commits_response + + main.extract_commits( + mock_session, "mozilla/firefox", 123, github_api_url=custom_url + ) + + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] + + def test_empty_commits_list(self, mock_session): + """Test handling PR with no commits.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [] + + mock_session.get.return_value = commits_response + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert result == [] + + +class TestExtractReviewers: + """Tests for extract_reviewers function.""" + + def test_fetch_reviewers(self, mock_session): + """Test fetching reviewers for a PR.""" + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [ + { + "id": 789, + "user": {"login": "reviewer1"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T15:00:00Z", + }, + { + "id": 790, + "user": {"login": "reviewer2"}, + "state": "CHANGES_REQUESTED", + "submitted_at": "2024-01-01T16:00:00Z", + }, + ] + + mock_session.get.return_value = reviewers_response + + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) + + assert len(result) == 2 + assert result[0]["state"] == "APPROVED" + assert result[1]["state"] == "CHANGES_REQUESTED" + + def test_multiple_review_states(self, mock_session): + """Test handling multiple different review states.""" + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [ + {"id": 1, "state": "APPROVED", "user": {"login": "user1"}}, + {"id": 2, "state": "CHANGES_REQUESTED", "user": {"login": "user2"}}, + {"id": 3, "state": "COMMENTED", "user": {"login": "user3"}}, + {"id": 4, "state": "DISMISSED", "user": {"login": "user4"}}, + ] + + mock_session.get.return_value = reviewers_response + + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) + + assert len(result) == 4 + states = [r["state"] for r in result] + assert "APPROVED" in states + assert "CHANGES_REQUESTED" in states + assert "COMMENTED" in states + + def test_empty_reviewers_list(self, mock_session): + """Test handling PR with no reviewers.""" + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [] + + mock_session.get.return_value = reviewers_response + + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) + + assert result == [] + + @patch("main.sleep_for_rate_limit") + def test_rate_limit_handling(self, mock_sleep, mock_session): + """Test rate limit handling when fetching reviewers.""" + rate_limit_response = Mock() + rate_limit_response.status_code = 403 + rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} + + success_response = Mock() + success_response.status_code = 200 + success_response.json.return_value = [] + + mock_session.get.side_effect = [rate_limit_response, success_response] + + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) + + mock_sleep.assert_called_once() + assert result == [] + + def test_api_error(self, mock_session): + """Test API error handling when fetching reviewers.""" + error_response = Mock() + error_response.status_code = 500 + error_response.text = "Internal Server Error" + + mock_session.get.return_value = error_response + + with pytest.raises(SystemExit) as exc_info: + main.extract_reviewers(mock_session, "mozilla/firefox", 123) + + assert "GitHub API error 500" in str(exc_info.value) + + def test_custom_github_api_url(self, mock_session): + """Test using custom GitHub API URL for reviewers.""" + custom_url = "https://mock-github.example.com" + + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [] + + mock_session.get.return_value = reviewers_response + + main.extract_reviewers( + mock_session, "mozilla/firefox", 123, github_api_url=custom_url + ) + + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] + + +class TestExtractComments: + """Tests for extract_comments function.""" + + def test_fetch_comments(self, mock_session): + """Test fetching comments for a PR.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [ + { + "id": 456, + "user": {"login": "commenter1"}, + "body": "This looks good", + "created_at": "2024-01-01T14:00:00Z", + }, + { + "id": 457, + "user": {"login": "commenter2"}, + "body": "I have concerns", + "created_at": "2024-01-01T15:00:00Z", + }, + ] + + mock_session.get.return_value = comments_response + + result = main.extract_comments(mock_session, "mozilla/firefox", 123) + + assert len(result) == 2 + assert result[0]["id"] == 456 + assert result[1]["id"] == 457 + + def test_uses_issues_endpoint(self, mock_session): + """Test that comments use /issues endpoint not /pulls.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [] + + mock_session.get.return_value = comments_response + + main.extract_comments(mock_session, "mozilla/firefox", 123) + + call_args = mock_session.get.call_args + url = call_args[0][0] + assert "/issues/123/comments" in url + assert "/pulls/123/comments" not in url + + def test_multiple_comments(self, mock_session): + """Test handling multiple comments.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [ + {"id": i, "user": {"login": f"user{i}"}, "body": f"Comment {i}"} + for i in range(1, 11) + ] + + mock_session.get.return_value = comments_response + + result = main.extract_comments(mock_session, "mozilla/firefox", 123) + + assert len(result) == 10 + + def test_empty_comments_list(self, mock_session): + """Test handling PR with no comments.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [] + + mock_session.get.return_value = comments_response + + result = main.extract_comments(mock_session, "mozilla/firefox", 123) + + assert result == [] + + @patch("main.sleep_for_rate_limit") + def test_rate_limit_handling(self, mock_sleep, mock_session): + """Test rate limit handling when fetching comments.""" + rate_limit_response = Mock() + rate_limit_response.status_code = 403 + rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} + + success_response = Mock() + success_response.status_code = 200 + success_response.json.return_value = [] + + mock_session.get.side_effect = [rate_limit_response, success_response] + + result = main.extract_comments(mock_session, "mozilla/firefox", 123) + + mock_sleep.assert_called_once() + assert result == [] + + def test_api_error(self, mock_session): + """Test API error handling when fetching comments.""" + error_response = Mock() + error_response.status_code = 404 + error_response.text = "Not Found" + + mock_session.get.return_value = error_response + + with pytest.raises(SystemExit) as exc_info: + main.extract_comments(mock_session, "mozilla/firefox", 123) + + assert "GitHub API error 404" in str(exc_info.value) + + def test_custom_github_api_url(self, mock_session): + """Test using custom GitHub API URL for comments.""" + custom_url = "https://mock-github.example.com" + + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [] + + mock_session.get.return_value = comments_response + + main.extract_comments( + mock_session, "mozilla/firefox", 123, github_api_url=custom_url + ) + + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] + + +class TestTransformData: + """Tests for transform_data function.""" + + def test_basic_pr_transformation(self): + """Test basic pull request field mapping.""" + raw_data = [ + { + "number": 123, + "title": "Fix login bug", + "state": "closed", + "created_at": "2024-01-01T10:00:00Z", + "updated_at": "2024-01-02T10:00:00Z", + "merged_at": "2024-01-02T12:00:00Z", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["pull_requests"]) == 1 + pr = result["pull_requests"][0] + assert pr["pull_request_id"] == 123 + assert pr["current_status"] == "closed" + assert pr["date_created"] == "2024-01-01T10:00:00Z" + assert pr["date_modified"] == "2024-01-02T10:00:00Z" + assert pr["date_landed"] == "2024-01-02T12:00:00Z" + assert pr["target_repository"] == "mozilla/firefox" + + def test_bug_id_extraction_basic(self): + """Test bug ID extraction from PR title.""" + test_cases = [ + ("Bug 1234567 - Fix issue", 1234567), + ("bug 1234567: Update code", 1234567), + ("Fix for bug 7654321", 7654321), + ("b=9876543 - Change behavior", 9876543), + ] + + for title, expected_bug_id in test_cases: + raw_data = [ + { + "number": 1, + "title": title, + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] == expected_bug_id + + def test_bug_id_extraction_with_hash(self): + """Test bug ID extraction with # symbol.""" + raw_data = [ + { + "number": 1, + "title": "Bug #1234567 - Fix issue", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] == 1234567 + + def test_bug_id_filter_large_numbers(self): + """Test that bug IDs >= 100000000 are filtered out.""" + raw_data = [ + { + "number": 1, + "title": "Bug 999999999 - Invalid bug ID", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] is None + + def test_bug_id_no_match(self): + """Test PR title with no bug ID.""" + raw_data = [ + { + "number": 1, + "title": "Update documentation", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] is None + + def test_labels_extraction(self): + """Test labels array extraction.""" + raw_data = [ + { + "number": 1, + "title": "PR with labels", + "state": "open", + "labels": [ + {"name": "bug"}, + {"name": "priority-high"}, + {"name": "needs-review"}, + ], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + labels = result["pull_requests"][0]["labels"] + assert len(labels) == 3 + assert "bug" in labels + assert "priority-high" in labels + assert "needs-review" in labels + + def test_labels_empty_list(self): + """Test handling empty labels list.""" + raw_data = [ + { + "number": 1, + "title": "PR without labels", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["labels"] == [] + + def test_commit_transformation(self): + """Test commit fields mapping.""" + raw_data = [ + { + "number": 123, + "title": "PR with commits", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "abc123", + "commit": { + "author": { + "name": "Test Author", + "date": "2024-01-01T12:00:00Z", + } + }, + "files": [ + { + "filename": "src/main.py", + "additions": 10, + "deletions": 5, + } + ], + } + ], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["commits"]) == 1 + commit = result["commits"][0] + assert commit["pull_request_id"] == 123 + assert commit["target_repository"] == "mozilla/firefox" + assert commit["commit_sha"] == "abc123" + assert commit["date_created"] == "2024-01-01T12:00:00Z" + assert commit["author_username"] == "Test Author" + assert commit["filename"] == "src/main.py" + assert commit["lines_added"] == 10 + assert commit["lines_removed"] == 5 + + def test_commit_file_flattening(self): + """Test that each file becomes a separate row.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple files", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "abc123", + "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, + "files": [ + {"filename": "file1.py", "additions": 10, "deletions": 5}, + {"filename": "file2.py", "additions": 20, "deletions": 2}, + {"filename": "file3.py", "additions": 5, "deletions": 15}, + ], + } + ], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + # Should have 3 rows in commits table (one per file) + assert len(result["commits"]) == 3 + filenames = [c["filename"] for c in result["commits"]] + assert "file1.py" in filenames + assert "file2.py" in filenames + assert "file3.py" in filenames + + def test_multiple_commits_with_files(self): + """Test multiple commits with multiple files per PR.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple commits", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "commit1", + "commit": {"author": {"name": "Author1", "date": "2024-01-01"}}, + "files": [ + {"filename": "file1.py", "additions": 10, "deletions": 0} + ], + }, + { + "sha": "commit2", + "commit": {"author": {"name": "Author2", "date": "2024-01-02"}}, + "files": [ + {"filename": "file2.py", "additions": 5, "deletions": 2}, + {"filename": "file3.py", "additions": 8, "deletions": 3}, + ], + }, + ], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + # Should have 3 rows total (1 file from commit1, 2 files from commit2) + assert len(result["commits"]) == 3 + assert result["commits"][0]["commit_sha"] == "commit1" + assert result["commits"][1]["commit_sha"] == "commit2" + assert result["commits"][2]["commit_sha"] == "commit2" + + def test_reviewer_transformation(self): + """Test reviewer fields mapping.""" + raw_data = [ + { + "number": 123, + "title": "PR with reviewers", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 789, + "user": {"login": "reviewer1"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T15:00:00Z", + } + ], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["reviewers"]) == 1 + reviewer = result["reviewers"][0] + assert reviewer["pull_request_id"] == 123 + assert reviewer["target_repository"] == "mozilla/firefox" + assert reviewer["reviewer_username"] == "reviewer1" + assert reviewer["status"] == "APPROVED" + assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z" + + def test_multiple_review_states(self): + """Test handling multiple review states.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple reviews", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T15:00:00Z", + }, + { + "id": 2, + "user": {"login": "user2"}, + "state": "CHANGES_REQUESTED", + "submitted_at": "2024-01-01T16:00:00Z", + }, + { + "id": 3, + "user": {"login": "user3"}, + "state": "COMMENTED", + "submitted_at": "2024-01-01T17:00:00Z", + }, + ], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["reviewers"]) == 3 + states = [r["status"] for r in result["reviewers"]] + assert "APPROVED" in states + assert "CHANGES_REQUESTED" in states + assert "COMMENTED" in states + + def test_date_approved_from_earliest_approval(self): + """Test that date_approved is set to earliest APPROVED review.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple approvals", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "state": "APPROVED", + "submitted_at": "2024-01-02T15:00:00Z", + }, + { + "id": 2, + "user": {"login": "user2"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T14:00:00Z", # Earliest + }, + { + "id": 3, + "user": {"login": "user3"}, + "state": "APPROVED", + "submitted_at": "2024-01-03T16:00:00Z", + }, + ], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + pr = result["pull_requests"][0] + assert pr["date_approved"] == "2024-01-01T14:00:00Z" + + def test_comment_transformation(self): + """Test comment fields mapping.""" + raw_data = [ + { + "number": 123, + "title": "PR with comments", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [ + { + "id": 456, + "user": {"login": "commenter1"}, + "body": "This looks great!", + "created_at": "2024-01-01T14:00:00Z", + "pull_request_review_id": None, + } + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["comments"]) == 1 + comment = result["comments"][0] + assert comment["pull_request_id"] == 123 + assert comment["target_repository"] == "mozilla/firefox" + assert comment["comment_id"] == 456 + assert comment["author_username"] == "commenter1" + assert comment["date_created"] == "2024-01-01T14:00:00Z" + assert comment["character_count"] == 17 + + def test_comment_character_count(self): + """Test character count calculation for comments.""" + raw_data = [ + { + "number": 123, + "title": "PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "body": "Short", + "created_at": "2024-01-01", + }, + { + "id": 2, + "user": {"login": "user2"}, + "body": "This is a much longer comment with more text", + "created_at": "2024-01-01", + }, + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert result["comments"][0]["character_count"] == 5 + assert result["comments"][1]["character_count"] == 44 + + def test_comment_status_from_review(self): + """Test that comment status is mapped from review_id_statuses.""" + raw_data = [ + { + "number": 123, + "title": "PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 789, + "user": {"login": "reviewer"}, + "state": "APPROVED", + "submitted_at": "2024-01-01", + } + ], + "comment_data": [ + { + "id": 456, + "user": {"login": "commenter"}, + "body": "LGTM", + "created_at": "2024-01-01", + "pull_request_review_id": 789, + } + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + # Comment should have status from the review + assert result["comments"][0]["status"] == "APPROVED" + + def test_comment_empty_body(self): + """Test handling comments with empty or None body.""" + raw_data = [ + { + "number": 123, + "title": "PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "body": None, + "created_at": "2024-01-01", + }, + { + "id": 2, + "user": {"login": "user2"}, + "body": "", + "created_at": "2024-01-01", + }, + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert result["comments"][0]["character_count"] == 0 + assert result["comments"][1]["character_count"] == 0 + + def test_empty_raw_data(self): + """Test handling empty input list.""" + result = main.transform_data([], "mozilla/firefox") + + assert result["pull_requests"] == [] + assert result["commits"] == [] + assert result["reviewers"] == [] + assert result["comments"] == [] + + def test_pr_without_commits_reviewers_comments(self): + """Test PR with no commits, reviewers, or comments.""" + raw_data = [ + { + "number": 123, + "title": "Minimal PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["pull_requests"]) == 1 + assert len(result["commits"]) == 0 + assert len(result["reviewers"]) == 0 + assert len(result["comments"]) == 0 + + def test_return_structure(self): + """Test that transform_data returns dict with 4 keys.""" + raw_data = [ + { + "number": 1, + "title": "Test", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert isinstance(result, dict) + assert "pull_requests" in result + assert "commits" in result + assert "reviewers" in result + assert "comments" in result + + def test_all_tables_have_target_repository(self): + """Test that all tables include target_repository field.""" + raw_data = [ + { + "number": 123, + "title": "Test PR", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "abc", + "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, + "files": [{"filename": "test.py", "additions": 1, "deletions": 0}], + } + ], + "reviewer_data": [ + { + "id": 1, + "user": {"login": "reviewer"}, + "state": "APPROVED", + "submitted_at": "2024-01-01", + } + ], + "comment_data": [ + { + "id": 2, + "user": {"login": "commenter"}, + "body": "Test", + "created_at": "2024-01-01", + } + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert result["pull_requests"][0]["target_repository"] == "mozilla/firefox" + assert result["commits"][0]["target_repository"] == "mozilla/firefox" + assert result["reviewers"][0]["target_repository"] == "mozilla/firefox" + assert result["comments"][0]["target_repository"] == "mozilla/firefox" + + +class TestLoadData: + """Tests for load_data function.""" + + @patch("main.datetime") + def test_load_all_tables(self, mock_datetime, mock_bigquery_client): + """Test loading all 4 tables to BigQuery.""" + mock_datetime.now.return_value.strftime.return_value = "2024-01-15" + + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [{"commit_sha": "abc"}], + "reviewers": [{"reviewer_username": "user1"}], + "comments": [{"comment_id": 123}], + } + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + # Should call insert_rows_json 4 times (once per table) + assert mock_bigquery_client.insert_rows_json.call_count == 4 + + @patch("main.datetime") + def test_adds_snapshot_date(self, mock_datetime, mock_bigquery_client): + """Test that snapshot_date is added to all rows.""" + mock_datetime.now.return_value.strftime.return_value = "2024-01-15" + + transformed_data = { + "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}], + "commits": [], + "reviewers": [], + "comments": [], + } + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + call_args = mock_bigquery_client.insert_rows_json.call_args + rows = call_args[0][1] + assert all(row["snapshot_date"] == "2024-01-15" for row in rows) + + def test_constructs_correct_table_ref(self, mock_bigquery_client): + """Test that table_ref is constructed correctly.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } + + main.load_data(mock_bigquery_client, "my_dataset", transformed_data) + + call_args = mock_bigquery_client.insert_rows_json.call_args + table_ref = call_args[0][0] + assert table_ref == "test-project.my_dataset.pull_requests" + + def test_empty_transformed_data_skipped(self, mock_bigquery_client): + """Test that empty transformed_data dict is skipped.""" + transformed_data = {} + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + mock_bigquery_client.insert_rows_json.assert_not_called() + + def test_skips_empty_tables_individually(self, mock_bigquery_client): + """Test that empty tables are skipped individually.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], # Empty, should be skipped + "reviewers": [], # Empty, should be skipped + "comments": [{"comment_id": 456}], + } + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + # Should only call insert_rows_json twice (for PRs and comments) + assert mock_bigquery_client.insert_rows_json.call_count == 2 + + def test_only_pull_requests_table(self, mock_bigquery_client): + """Test loading only pull_requests table.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + assert mock_bigquery_client.insert_rows_json.call_count == 1 + + def test_raises_exception_on_insert_errors(self, mock_bigquery_client): + """Test that Exception is raised on BigQuery insert errors.""" + mock_bigquery_client.insert_rows_json.return_value = [ + {"index": 0, "errors": ["Insert failed"]} + ] + + transformed_data = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []} + + with pytest.raises(Exception) as exc_info: + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + assert "BigQuery insert errors" in str(exc_info.value) + + def test_verifies_client_insert_called_correctly(self, mock_bigquery_client): + """Test that client.insert_rows_json is called with correct arguments.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}], + "commits": [], + "reviewers": [], + "comments": [], + } + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + call_args = mock_bigquery_client.insert_rows_json.call_args + table_ref, rows = call_args[0] + + assert "pull_requests" in table_ref + assert len(rows) == 2 + + +class TestMain: + """Tests for main function.""" + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_requires_github_repos( + self, mock_session_class, mock_bq_client, mock_setup_logging + ): + """Test that GITHUB_REPOS is required.""" + with patch.dict( + os.environ, + {"BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test"}, + clear=True, + ): + with pytest.raises(SystemExit) as exc_info: + main.main() + + assert "GITHUB_REPOS" in str(exc_info.value) + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_requires_bigquery_project( + self, mock_session_class, mock_bq_client, mock_setup_logging + ): + """Test that BIGQUERY_PROJECT is required.""" + with patch.dict( + os.environ, {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"}, clear=True + ): + with pytest.raises(SystemExit) as exc_info: + main.main() + + assert "BIGQUERY_PROJECT" in str(exc_info.value) + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_requires_bigquery_dataset( + self, mock_session_class, mock_bq_client, mock_setup_logging + ): + """Test that BIGQUERY_DATASET is required.""" + with patch.dict( + os.environ, {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"}, clear=True + ): + with pytest.raises(SystemExit) as exc_info: + main.main() + + assert "BIGQUERY_DATASET" in str(exc_info.value) + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_github_token_optional_with_warning( + self, mock_session_class, mock_bq_client, mock_setup_logging + ): + """Test that GITHUB_TOKEN is optional but warns if missing.""" + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + }, + clear=True, + ), patch("main.extract_pull_requests", return_value=iter([])): + # Should not raise, but should log warning + result = main.main() + assert result == 0 + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_splits_github_repos_by_comma( + self, mock_session_class, mock_bq_client, mock_setup_logging + ): + """Test that GITHUB_REPOS is split by comma.""" + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ), patch("main.extract_pull_requests", return_value=iter([])) as mock_extract: + main.main() + + # Should be called twice (once per repo) + assert mock_extract.call_count == 2 + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_honors_github_api_url( + self, mock_session_class, mock_bq_client, mock_setup_logging + ): + """Test that GITHUB_API_URL is honored.""" + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + "GITHUB_API_URL": "https://custom-api.example.com", + }, + clear=True, + ), patch("main.extract_pull_requests", return_value=iter([])) as mock_extract: + main.main() + + call_kwargs = mock_extract.call_args[1] + assert call_kwargs["github_api_url"] == "https://custom-api.example.com" + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_honors_bigquery_emulator_host( + self, mock_session_class, mock_bq_client_class, mock_setup_logging + ): + """Test that BIGQUERY_EMULATOR_HOST is honored.""" + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + "BIGQUERY_EMULATOR_HOST": "http://localhost:9050", + }, + clear=True, + ), patch("main.extract_pull_requests", return_value=iter([])): + main.main() + + # Verify BigQuery client was created with emulator settings + mock_bq_client_class.assert_called_once() + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_creates_session_with_headers( + self, mock_session_class, mock_bq_client, mock_setup_logging + ): + """Test that session is created with Accept and User-Agent headers.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ), patch("main.extract_pull_requests", return_value=iter([])): + main.main() + + # Verify session headers were set + assert mock_session.headers.update.called + call_args = mock_session.headers.update.call_args[0][0] + assert "Accept" in call_args + assert "User-Agent" in call_args + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_sets_authorization_header_with_token( + self, mock_session_class, mock_bq_client, mock_setup_logging + ): + """Test that Authorization header is set when token provided.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "test-token-123", + }, + clear=True, + ), patch("main.extract_pull_requests", return_value=iter([])): + main.main() + + # Verify Authorization header was set + assert mock_session.headers.__setitem__.called + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + @patch("main.extract_pull_requests") + @patch("main.transform_data") + @patch("main.load_data") + def test_single_repo_successful_etl( + self, + mock_load, + mock_transform, + mock_extract, + mock_session_class, + mock_bq_client, + mock_setup_logging, + ): + """Test successful ETL for single repository.""" + mock_extract.return_value = iter([[{"number": 1}]]) + mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []} + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + mock_extract.assert_called_once() + mock_transform.assert_called_once() + mock_load.assert_called_once() + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + @patch("main.extract_pull_requests") + @patch("main.transform_data") + @patch("main.load_data") + def test_multiple_repos_processing( + self, + mock_load, + mock_transform, + mock_extract, + mock_session_class, + mock_bq_client, + mock_setup_logging, + ): + """Test processing multiple repositories.""" + mock_extract.return_value = iter([[{"number": 1}]]) + mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []} + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev,mozilla/addons", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + # Should process 3 repositories + assert mock_extract.call_count == 3 + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + @patch("main.extract_pull_requests") + @patch("main.transform_data") + @patch("main.load_data") + def test_processes_chunks_iteratively( + self, + mock_load, + mock_transform, + mock_extract, + mock_session_class, + mock_bq_client, + mock_setup_logging, + ): + """Test that chunks are processed iteratively from generator.""" + # Return 3 chunks + mock_extract.return_value = iter([ + [{"number": 1}], + [{"number": 2}], + [{"number": 3}], + ]) + mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []} + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + # Transform and load should be called 3 times (once per chunk) + assert mock_transform.call_count == 3 + assert mock_load.call_count == 3 + + @patch("main.setup_logging") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_returns_zero_on_success( + self, mock_session_class, mock_bq_client, mock_setup_logging + ): + """Test that main returns 0 on success.""" + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ), patch("main.extract_pull_requests", return_value=iter([])): + result = main.main() + + assert result == 0 + + +@pytest.mark.integration +class TestIntegration: + """Integration tests that test multiple components together.""" + + @patch("main.setup_logging") + @patch("main.load_data") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_end_to_end_with_mocked_github( + self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging + ): + """Test end-to-end flow with mocked GitHub responses.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + # Mock PR response + pr_response = Mock() + pr_response.status_code = 200 + pr_response.json.return_value = [ + {"number": 1, "title": "Bug 1234567 - Test PR", "state": "open"} + ] + pr_response.links = {} + + # Mock commits, reviewers, comments responses + empty_response = Mock() + empty_response.status_code = 200 + empty_response.json.return_value = [] + + mock_session.get.side_effect = [ + pr_response, + empty_response, + empty_response, + empty_response, + ] + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + mock_load.assert_called_once() + + # Verify transformed data structure + call_args = mock_load.call_args[0] + transformed_data = call_args[2] + assert "pull_requests" in transformed_data + assert len(transformed_data["pull_requests"]) == 1 + + @patch("main.setup_logging") + @patch("main.load_data") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_bug_id_extraction_through_pipeline( + self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging + ): + """Test bug ID extraction through full pipeline.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + pr_response = Mock() + pr_response.status_code = 200 + pr_response.json.return_value = [ + {"number": 1, "title": "Bug 9876543 - Fix critical issue", "state": "closed"} + ] + pr_response.links = {} + + empty_response = Mock() + empty_response.status_code = 200 + empty_response.json.return_value = [] + + mock_session.get.side_effect = [ + pr_response, + empty_response, + empty_response, + empty_response, + ] + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + main.main() + + call_args = mock_load.call_args[0] + transformed_data = call_args[2] + pr = transformed_data["pull_requests"][0] + assert pr["bug_id"] == 9876543 + + @patch("main.setup_logging") + @patch("main.load_data") + @patch("main.bigquery.Client") + @patch("requests.Session") + def test_pagination_through_full_flow( + self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging + ): + """Test pagination through full ETL flow.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + # First page + pr_response_1 = Mock() + pr_response_1.status_code = 200 + pr_response_1.json.return_value = [ + {"number": 1, "title": "PR 1", "state": "open"} + ] + pr_response_1.links = { + "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} + } + + # Second page + pr_response_2 = Mock() + pr_response_2.status_code = 200 + pr_response_2.json.return_value = [ + {"number": 2, "title": "PR 2", "state": "open"} + ] + pr_response_2.links = {} + + empty_response = Mock() + empty_response.status_code = 200 + empty_response.json.return_value = [] + + mock_session.get.side_effect = [ + pr_response_1, + empty_response, + empty_response, + empty_response, + pr_response_2, + empty_response, + empty_response, + empty_response, + ] + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + main.main() + + # Should be called twice (once per chunk/page) + assert mock_load.call_count == 2 From d6cb74c01067c5696fb3db9307bdeb71416bea0a Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Wed, 21 Jan 2026 18:47:57 -0500 Subject: [PATCH 02/11] Copilot suggested fixes --- TESTING.md | 3 +- pytest.ini | 4 +- test_main.py | 374 +++++++++++++++++++++++++++++---------------------- 3 files changed, 213 insertions(+), 168 deletions(-) diff --git a/TESTING.md b/TESTING.md index c0bb5dd..104d401 100644 --- a/TESTING.md +++ b/TESTING.md @@ -228,7 +228,7 @@ mypy main.py --no-strict-optional --ignore-missing-imports ### GitHub Actions Workflow -The `.github/workflows/tests.yml` workflow runs on every push and pull request: +The `.github/workflows/tests.yml` workflow runs on every pull request: **Lint Job:** 1. Runs black (format check) @@ -243,7 +243,6 @@ The `.github/workflows/tests.yml` workflow runs on every push and pull request: ### Workflow Triggers -- Push to `main` or `unit-tests` branch - Pull requests to `main` branch ### Viewing Results diff --git a/pytest.ini b/pytest.ini index d4a601a..33ef84b 100644 --- a/pytest.ini +++ b/pytest.ini @@ -15,9 +15,7 @@ addopts = --cov-report=term-missing --cov-report=html --cov-branch - -# Minimum coverage threshold (can adjust as needed) ---cov-fail-under=80 + --cov-fail-under=80 # Test paths testpaths = . diff --git a/test_main.py b/test_main.py index 7165677..400c6d3 100644 --- a/test_main.py +++ b/test_main.py @@ -8,10 +8,9 @@ import logging import os -import sys import time -from datetime import datetime, timezone -from unittest.mock import Mock, MagicMock, patch, call +from datetime import datetime +from unittest.mock import Mock, MagicMock, patch import pytest import requests from google.cloud import bigquery @@ -143,9 +142,7 @@ class TestSleepForRateLimit: @patch("time.time") @patch("time.sleep") - def test_sleep_for_rate_limit_when_remaining_is_zero( - self, mock_sleep, mock_time - ): + def test_sleep_for_rate_limit_when_remaining_is_zero(self, mock_sleep, mock_time): """Test that sleep_for_rate_limit sleeps until reset time.""" mock_time.return_value = 1000 @@ -220,12 +217,12 @@ def test_extract_single_page(self, mock_session): mock_session.get.return_value = mock_response # Mock the extract functions - with patch("main.extract_commits", return_value=[]), patch( - "main.extract_reviewers", return_value=[] - ), patch("main.extract_comments", return_value=[]): - result = list( - main.extract_pull_requests(mock_session, "mozilla/firefox") - ) + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) assert len(result) == 1 assert len(result[0]) == 2 @@ -242,9 +239,7 @@ def test_extract_multiple_pages(self, mock_session): {"number": 2, "title": "PR 2"}, ] mock_response_1.links = { - "next": { - "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2" - } + "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} } # Second page response @@ -255,12 +250,12 @@ def test_extract_multiple_pages(self, mock_session): mock_session.get.side_effect = [mock_response_1, mock_response_2] - with patch("main.extract_commits", return_value=[]), patch( - "main.extract_reviewers", return_value=[] - ), patch("main.extract_comments", return_value=[]): - result = list( - main.extract_pull_requests(mock_session, "mozilla/firefox") - ) + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) assert len(result) == 2 assert len(result[0]) == 2 @@ -279,16 +274,14 @@ def test_enriches_prs_with_commit_data(self, mock_session): mock_commits = [{"sha": "abc123"}] - with patch( - "main.extract_commits", return_value=mock_commits - ) as mock_extract_commits, patch( - "main.extract_reviewers", return_value=[] - ), patch( - "main.extract_comments", return_value=[] + with ( + patch( + "main.extract_commits", return_value=mock_commits + ) as mock_extract_commits, + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), ): - result = list( - main.extract_pull_requests(mock_session, "mozilla/firefox") - ) + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) assert result[0][0]["commit_data"] == mock_commits mock_extract_commits.assert_called_once() @@ -304,14 +297,14 @@ def test_enriches_prs_with_reviewer_data(self, mock_session): mock_reviewers = [{"id": 789, "state": "APPROVED"}] - with patch("main.extract_commits", return_value=[]), patch( - "main.extract_reviewers", return_value=mock_reviewers - ) as mock_extract_reviewers, patch( - "main.extract_comments", return_value=[] + with ( + patch("main.extract_commits", return_value=[]), + patch( + "main.extract_reviewers", return_value=mock_reviewers + ) as mock_extract_reviewers, + patch("main.extract_comments", return_value=[]), ): - result = list( - main.extract_pull_requests(mock_session, "mozilla/firefox") - ) + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) assert result[0][0]["reviewer_data"] == mock_reviewers mock_extract_reviewers.assert_called_once() @@ -327,14 +320,14 @@ def test_enriches_prs_with_comment_data(self, mock_session): mock_comments = [{"id": 456, "body": "Great work!"}] - with patch("main.extract_commits", return_value=[]), patch( - "main.extract_reviewers", return_value=[] - ), patch( - "main.extract_comments", return_value=mock_comments - ) as mock_extract_comments: - result = list( - main.extract_pull_requests(mock_session, "mozilla/firefox") - ) + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch( + "main.extract_comments", return_value=mock_comments + ) as mock_extract_comments, + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) assert result[0][0]["comment_data"] == mock_comments mock_extract_comments.assert_called_once() @@ -350,9 +343,7 @@ def test_handles_rate_limit(self, mock_sleep, mock_session): # Successful response after rate limit mock_response_success = Mock() mock_response_success.status_code = 200 - mock_response_success.json.return_value = [ - {"number": 1, "title": "PR 1"} - ] + mock_response_success.json.return_value = [{"number": 1, "title": "PR 1"}] mock_response_success.links = {} mock_session.get.side_effect = [ @@ -360,12 +351,12 @@ def test_handles_rate_limit(self, mock_sleep, mock_session): mock_response_success, ] - with patch("main.extract_commits", return_value=[]), patch( - "main.extract_reviewers", return_value=[] - ), patch("main.extract_comments", return_value=[]): - result = list( - main.extract_pull_requests(mock_session, "mozilla/firefox") - ) + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) mock_sleep.assert_called_once_with(mock_response_rate_limit) assert len(result) == 1 @@ -403,9 +394,7 @@ def test_stops_on_empty_batch(self, mock_session): mock_response_1.status_code = 200 mock_response_1.json.return_value = [{"number": 1}] mock_response_1.links = { - "next": { - "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2" - } + "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} } # Second page empty @@ -416,12 +405,12 @@ def test_stops_on_empty_batch(self, mock_session): mock_session.get.side_effect = [mock_response_1, mock_response_2] - with patch("main.extract_commits", return_value=[]), patch( - "main.extract_reviewers", return_value=[] - ), patch("main.extract_comments", return_value=[]): - result = list( - main.extract_pull_requests(mock_session, "mozilla/firefox") - ) + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) # Should only have 1 chunk from first page assert len(result) == 1 @@ -440,12 +429,12 @@ def test_invalid_page_number_handling(self, mock_session): mock_session.get.return_value = mock_response_1 - with patch("main.extract_commits", return_value=[]), patch( - "main.extract_reviewers", return_value=[] - ), patch("main.extract_comments", return_value=[]): - result = list( - main.extract_pull_requests(mock_session, "mozilla/firefox") - ) + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) # Should stop pagination on invalid page number assert len(result) == 1 @@ -461,9 +450,11 @@ def test_custom_github_api_url(self, mock_session): mock_session.get.return_value = mock_response - with patch("main.extract_commits", return_value=[]), patch( - "main.extract_reviewers", return_value=[] - ), patch("main.extract_comments", return_value=[]): + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): list( main.extract_pull_requests( mock_session, "mozilla/firefox", github_api_url=custom_url @@ -487,12 +478,12 @@ def test_skips_prs_without_number_field(self, mock_session): mock_session.get.return_value = mock_response - with patch("main.extract_commits", return_value=[]) as mock_commits, patch( - "main.extract_reviewers", return_value=[] - ), patch("main.extract_comments", return_value=[]): - result = list( - main.extract_pull_requests(mock_session, "mozilla/firefox") - ) + with ( + patch("main.extract_commits", return_value=[]) as mock_commits, + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + list(main.extract_pull_requests(mock_session, "mozilla/firefox")) # extract_commits should only be called for PRs with number field assert mock_commits.call_count == 2 @@ -631,7 +622,11 @@ def test_commit_without_sha_field(self, mock_session): commit_detail_2.status_code = 200 commit_detail_2.json.return_value = {"files": []} - mock_session.get.side_effect = [commits_response, commit_detail_1, commit_detail_2] + mock_session.get.side_effect = [ + commits_response, + commit_detail_1, + commit_detail_2, + ] result = main.extract_commits(mock_session, "mozilla/firefox", 123) @@ -1470,7 +1465,9 @@ def test_all_tables_have_target_repository(self): { "sha": "abc", "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, - "files": [{"filename": "test.py", "additions": 1, "deletions": 0}], + "files": [ + {"filename": "test.py", "additions": 1, "deletions": 0} + ], } ], "reviewer_data": [ @@ -1594,7 +1591,12 @@ def test_raises_exception_on_insert_errors(self, mock_bigquery_client): {"index": 0, "errors": ["Insert failed"]} ] - transformed_data = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []} + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } with pytest.raises(Exception) as exc_info: main.load_data(mock_bigquery_client, "test_dataset", transformed_data) @@ -1647,7 +1649,9 @@ def test_requires_bigquery_project( ): """Test that BIGQUERY_PROJECT is required.""" with patch.dict( - os.environ, {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"}, clear=True + os.environ, + {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"}, + clear=True, ): with pytest.raises(SystemExit) as exc_info: main.main() @@ -1662,7 +1666,9 @@ def test_requires_bigquery_dataset( ): """Test that BIGQUERY_DATASET is required.""" with patch.dict( - os.environ, {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"}, clear=True + os.environ, + {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"}, + clear=True, ): with pytest.raises(SystemExit) as exc_info: main.main() @@ -1676,15 +1682,18 @@ def test_github_token_optional_with_warning( self, mock_session_class, mock_bq_client, mock_setup_logging ): """Test that GITHUB_TOKEN is optional but warns if missing.""" - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - }, - clear=True, - ), patch("main.extract_pull_requests", return_value=iter([])): + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): # Should not raise, but should log warning result = main.main() assert result == 0 @@ -1696,16 +1705,19 @@ def test_splits_github_repos_by_comma( self, mock_session_class, mock_bq_client, mock_setup_logging ): """Test that GITHUB_REPOS is split by comma.""" - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ), patch("main.extract_pull_requests", return_value=iter([])) as mock_extract: + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])) as mock_extract, + ): main.main() # Should be called twice (once per repo) @@ -1718,17 +1730,20 @@ def test_honors_github_api_url( self, mock_session_class, mock_bq_client, mock_setup_logging ): """Test that GITHUB_API_URL is honored.""" - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - "GITHUB_API_URL": "https://custom-api.example.com", - }, - clear=True, - ), patch("main.extract_pull_requests", return_value=iter([])) as mock_extract: + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + "GITHUB_API_URL": "https://custom-api.example.com", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])) as mock_extract, + ): main.main() call_kwargs = mock_extract.call_args[1] @@ -1741,17 +1756,20 @@ def test_honors_bigquery_emulator_host( self, mock_session_class, mock_bq_client_class, mock_setup_logging ): """Test that BIGQUERY_EMULATOR_HOST is honored.""" - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - "BIGQUERY_EMULATOR_HOST": "http://localhost:9050", - }, - clear=True, - ), patch("main.extract_pull_requests", return_value=iter([])): + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + "BIGQUERY_EMULATOR_HOST": "http://localhost:9050", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): main.main() # Verify BigQuery client was created with emulator settings @@ -1767,16 +1785,19 @@ def test_creates_session_with_headers( mock_session = MagicMock() mock_session_class.return_value = mock_session - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ), patch("main.extract_pull_requests", return_value=iter([])): + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): main.main() # Verify session headers were set @@ -1795,16 +1816,19 @@ def test_sets_authorization_header_with_token( mock_session = MagicMock() mock_session_class.return_value = mock_session - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "test-token-123", - }, - clear=True, - ), patch("main.extract_pull_requests", return_value=iter([])): + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "test-token-123", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): main.main() # Verify Authorization header was set @@ -1827,7 +1851,12 @@ def test_single_repo_successful_etl( ): """Test successful ETL for single repository.""" mock_extract.return_value = iter([[{"number": 1}]]) - mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []} + mock_transform.return_value = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } with patch.dict( os.environ, @@ -1863,7 +1892,12 @@ def test_multiple_repos_processing( ): """Test processing multiple repositories.""" mock_extract.return_value = iter([[{"number": 1}]]) - mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []} + mock_transform.return_value = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } with patch.dict( os.environ, @@ -1898,12 +1932,19 @@ def test_processes_chunks_iteratively( ): """Test that chunks are processed iteratively from generator.""" # Return 3 chunks - mock_extract.return_value = iter([ - [{"number": 1}], - [{"number": 2}], - [{"number": 3}], - ]) - mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []} + mock_extract.return_value = iter( + [ + [{"number": 1}], + [{"number": 2}], + [{"number": 3}], + ] + ) + mock_transform.return_value = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } with patch.dict( os.environ, @@ -1929,16 +1970,19 @@ def test_returns_zero_on_success( self, mock_session_class, mock_bq_client, mock_setup_logging ): """Test that main returns 0 on success.""" - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ), patch("main.extract_pull_requests", return_value=iter([])): + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): result = main.main() assert result == 0 @@ -2014,7 +2058,11 @@ def test_bug_id_extraction_through_pipeline( pr_response = Mock() pr_response.status_code = 200 pr_response.json.return_value = [ - {"number": 1, "title": "Bug 9876543 - Fix critical issue", "state": "closed"} + { + "number": 1, + "title": "Bug 9876543 - Fix critical issue", + "state": "closed", + } ] pr_response.links = {} From 5836a842064a92c0e725cfc4f6c7e7e6a54e6245 Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Wed, 21 Jan 2026 18:53:32 -0500 Subject: [PATCH 03/11] Black formatted --- test_main.py | 1 - 1 file changed, 1 deletion(-) diff --git a/test_main.py b/test_main.py index 400c6d3..210029c 100644 --- a/test_main.py +++ b/test_main.py @@ -17,7 +17,6 @@ import main - # ============================================================================= # FIXTURES # ============================================================================= From 76f54f3bc2137788d41c6ea90d8bb0cb98051e71 Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Wed, 21 Jan 2026 18:55:44 -0500 Subject: [PATCH 04/11] Used isort to fix sorting order --- test_main.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/test_main.py b/test_main.py index 210029c..0850eae 100644 --- a/test_main.py +++ b/test_main.py @@ -10,7 +10,8 @@ import os import time from datetime import datetime -from unittest.mock import Mock, MagicMock, patch +from unittest.mock import MagicMock, Mock, patch + import pytest import requests from google.cloud import bigquery From 9c288cc6fe1b92bdc81f00fe52c9123a7ca3d10c Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Wed, 21 Jan 2026 20:59:03 -0500 Subject: [PATCH 05/11] Mypy test fixes --- test_main.py | 2 -- 1 file changed, 2 deletions(-) diff --git a/test_main.py b/test_main.py index 0850eae..0e60118 100644 --- a/test_main.py +++ b/test_main.py @@ -8,8 +8,6 @@ import logging import os -import time -from datetime import datetime from unittest.mock import MagicMock, Mock, patch import pytest From b95c05fbce21a49890015ff5232b94e417f07818 Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Thu, 22 Jan 2026 16:35:31 -0500 Subject: [PATCH 06/11] Copilot fixes --- TESTING.md | 53 +++++++++++++++++++++++++++-------------------------- pytest.ini | 2 +- 2 files changed, 28 insertions(+), 27 deletions(-) diff --git a/TESTING.md b/TESTING.md index 104d401..c6a541c 100644 --- a/TESTING.md +++ b/TESTING.md @@ -19,22 +19,22 @@ unit tests, integration tests, Docker testing, linting, and CI/CD workflows. ## Unit Testing The test suite in `test_main.py` provides comprehensive coverage for all functions in `main.py`. -We have **95 unit tests** covering 9 functions with 80%+ code coverage requirement. +We have unit tests covering 9 functions with 80%+ code coverage requirement. ### Test Structure Tests are organized into 10 test classes: -1. **TestSetupLogging** (1 test) - Logging configuration -2. **TestSleepForRateLimit** (4 tests) - Rate limit handling -3. **TestExtractPullRequests** (14 tests) - PR extraction with pagination and enrichment -4. **TestExtractCommits** (9 tests) - Commit and file extraction -5. **TestExtractReviewers** (6 tests) - Reviewer extraction -6. **TestExtractComments** (7 tests) - Comment extraction (uses /issues endpoint) -7. **TestTransformData** (26 tests) - Data transformation for all 4 BigQuery tables -8. **TestLoadData** (8 tests) - BigQuery data loading -9. **TestMain** (17 tests) - Main ETL orchestration -10. **TestIntegration** (3 tests) - End-to-end integration tests (marked with `@pytest.mark.integration`) +1. **TestSetupLogging** - Logging configuration +2. **TestSleepForRateLimit** - Rate limit handling +3. **TestExtractPullRequests** - PR extraction with pagination and enrichment +4. **TestExtractCommits** - Commit and file extraction +5. **TestExtractReviewers** - Reviewer extraction +6. **TestExtractComments** - Comment extraction (uses /issues endpoint) +7. **TestTransformData** - Data transformation for all 4 BigQuery tables +8. **TestLoadData** - BigQuery data loading +9. **TestMain** - Main ETL orchestration +10. **TestIntegration** - End-to-end integration tests (marked with `@pytest.mark.integration`) ### Fixtures @@ -51,17 +51,17 @@ Reusable fixtures are defined at the top of `test_main.py`: ### Function Coverage -| Function | Tests | Coverage Target | Key Test Areas | -|----------|-------|-----------------|----------------| -| `setup_logging()` | 1 | 100% | Logger configuration | -| `sleep_for_rate_limit()` | 4 | 100% | Rate limit sleep logic, edge cases | -| `extract_pull_requests()` | 14 | 90%+ | Pagination, rate limits, enrichment, error handling | -| `extract_commits()` | 9 | 85%+ | Commit/file fetching, rate limits, errors | -| `extract_reviewers()` | 6 | 85%+ | Reviewer states, rate limits, errors | -| `extract_comments()` | 7 | 85%+ | Comment fetching (via /issues), rate limits | -| `transform_data()` | 26 | 95%+ | Bug ID extraction, 4 tables, field mapping | -| `load_data()` | 8 | 90%+ | BigQuery insertion, snapshot dates, errors | -| `main()` | 17 | 85%+ | Env vars, orchestration, chunking | +| Function | Coverage Target | Key Test Areas | +|----------|------------------|----------------| +| `setup_logging()` | 100% | Logger configuration | +| `sleep_for_rate_limit()` | 100% | Rate limit sleep logic, edge cases | +| `extract_pull_requests()` | 90%+ | Pagination, rate limits, enrichment, error handling | +| `extract_commits()` | 85%+ | Commit/file fetching, rate limits, errors | +| `extract_reviewers()` | 85%+ | Reviewer states, rate limits, errors | +| `extract_comments()` | 85%+ | Comment fetching (via /issues), rate limits | +| `transform_data()` | 95%+ | Bug ID extraction, 4 tables, field mapping | +| `load_data()` | 90%+ | BigQuery insertion, snapshot dates, errors | +| `main()` | 85%+ | Env vars, orchestration, chunking | **Overall Target: 85-90% coverage** (80% minimum enforced in CI) @@ -318,8 +318,8 @@ docker-compose down - 9050 (BigQuery API) - 9060 (Discovery/Admin API) - **Configuration**: Uses `data.yml` to define the schema -- **Project**: test-project -- **Dataset**: test_dataset +- **Project**: test +- **Dataset**: github_etl - **Table**: pull_requests ### ETL Service @@ -328,8 +328,9 @@ The ETL service is configured via environment variables in `docker-compose.yml`: ```yaml environment: - GITHUB_REPOS: "mozilla/firefox" - GITHUB_API_URL: "http://mock-github-api:5000" # Points to mock API + GITHUB_REPOS: "mozilla-firefox/firefox" + GITHUB_TOKEN: "" # Not needed for mock API + GITHUB_API_URL: "http://mock-github-api:5000" BIGQUERY_PROJECT: "test" BIGQUERY_DATASET: "github_etl" BIGQUERY_EMULATOR_HOST: "http://bigquery-emulator:9050" diff --git a/pytest.ini b/pytest.ini index 33ef84b..d553b45 100644 --- a/pytest.ini +++ b/pytest.ini @@ -34,7 +34,7 @@ log_cli_date_format = %Y-%m-%d %H:%M:%S # Coverage options [coverage:run] -source = . +source = main omit = test_*.py .venv/* From 8b7eb487cb3209939070b036ab9528f04d05d6ae Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Fri, 23 Jan 2026 18:16:04 -0500 Subject: [PATCH 07/11] Fixed review comments --- Dockerfile | 4 +- Dockerfile.mock | 2 +- README.md | 2 +- TESTING.md | 2 +- pyproject.toml | 1 + pytest.ini | 47 - requirements.txt | 2 +- test_formatting.py | 16 + test_main.py | 3456 ++++++++++++++++++++++---------------------- 9 files changed, 1744 insertions(+), 1788 deletions(-) delete mode 100644 pytest.ini create mode 100644 test_formatting.py diff --git a/Dockerfile b/Dockerfile index 5608295..bec1ed8 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,5 +1,5 @@ # Use the latest stable Python image -FROM python:3.11-slim +FROM python:3.14.2-slim # Set environment variables ENV PYTHONDONTWRITEBYTECODE=1 \ @@ -34,4 +34,4 @@ RUN chown -R app:app /app USER app # Set the default command -CMD ["python", "main.py"] \ No newline at end of file +CMD ["python", "main.py"] diff --git a/Dockerfile.mock b/Dockerfile.mock index 1098382..cf46078 100644 --- a/Dockerfile.mock +++ b/Dockerfile.mock @@ -1,5 +1,5 @@ # Dockerfile for mock GitHub API service -FROM python:3.11-slim +FROM python:3.14.2-slim WORKDIR /app diff --git a/README.md b/README.md index 80a3afe..570bacb 100644 --- a/README.md +++ b/README.md @@ -66,7 +66,7 @@ docker run --rm \ ### Container Specifications -- **Base Image**: `python:3.11-slim` (latest stable Python) +- **Base Image**: `python:3.14.2-slim` (latest stable Python) - **User**: `app` (uid: 1000, gid: 1000) - **Working Directory**: `/app` - **Ownership**: All files in `/app` are owned by the `app` user diff --git a/TESTING.md b/TESTING.md index c6a541c..6901d2f 100644 --- a/TESTING.md +++ b/TESTING.md @@ -604,7 +604,7 @@ If coverage is below 80%: ### Tests Pass Locally But Fail in CI -- Check Python version (must be 3.11) +- Check Python version (must be 3.14) - Verify all dependencies are in `requirements.txt` - Look for environment-specific issues diff --git a/pyproject.toml b/pyproject.toml index f4aac49..ed3b2a4 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -25,6 +25,7 @@ dependencies = [ [project.optional-dependencies] dev = [ "pytest>=7.0.0", + "pytest-mock>=3.10.0", "ruff>=0.14.14", "black>=24.0.0", ] diff --git a/pytest.ini b/pytest.ini deleted file mode 100644 index d553b45..0000000 --- a/pytest.ini +++ /dev/null @@ -1,47 +0,0 @@ -[pytest] -# Pytest configuration for GitHub ETL project - -# Test discovery patterns -python_files = test_*.py -python_classes = Test* -python_functions = test_* - -# Output options -addopts = - -v - --strict-markers - --tb=short - --cov=main - --cov-report=term-missing - --cov-report=html - --cov-branch - --cov-fail-under=80 - -# Test paths -testpaths = . - -# Markers for organizing tests -markers = - unit: Unit tests for individual functions - integration: Integration tests that test multiple components - slow: Tests that take longer to run - -# Logging -log_cli = false -log_cli_level = INFO -log_cli_format = %(asctime)s [%(levelname)8s] %(message)s -log_cli_date_format = %Y-%m-%d %H:%M:%S - -# Coverage options -[coverage:run] -source = main -omit = - test_*.py - .venv/* - venv/* - */site-packages/* - -[coverage:report] -precision = 2 -show_missing = true -skip_covered = false diff --git a/requirements.txt b/requirements.txt index fd521f6..d487f50 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,5 +1,5 @@ # -# This file is autogenerated by pip-compile with Python 3.14 +# This file is autogenerated by pip-compile with Python 3.10 # by the following command: # # pip-compile --generate-hashes pyproject.toml diff --git a/test_formatting.py b/test_formatting.py new file mode 100644 index 0000000..c92e534 --- /dev/null +++ b/test_formatting.py @@ -0,0 +1,16 @@ +""" +Code Style Tests. +""" + +import subprocess + + +def test_black(): + cmd = ("black", "--diff", "main.py") + output = subprocess.check_output(cmd) + assert not output, "The python code does not adhere to the project style." + + +def test_ruff(): + passed = subprocess.call(("ruff", "check", "main.py", "--target-version", "py314")) + assert not passed, "ruff did not run cleanly." diff --git a/test_main.py b/test_main.py index 0e60118..0d38ac3 100644 --- a/test_main.py +++ b/test_main.py @@ -116,1325 +116,839 @@ def mock_comment_response(): # ============================================================================= -class TestSetupLogging: - """Tests for setup_logging function.""" - def test_setup_logging_configures_logger(self): - """Test that setup_logging configures the root logger correctly.""" - main.setup_logging() - - root_logger = logging.getLogger() - assert root_logger.level == logging.INFO - assert len(root_logger.handlers) > 0 - - # Check that at least one handler is a StreamHandler - has_stream_handler = any( - isinstance(handler, logging.StreamHandler) - for handler in root_logger.handlers - ) - assert has_stream_handler - - -class TestSleepForRateLimit: - """Tests for sleep_for_rate_limit function.""" - - @patch("time.time") - @patch("time.sleep") - def test_sleep_for_rate_limit_when_remaining_is_zero(self, mock_sleep, mock_time): - """Test that sleep_for_rate_limit sleeps until reset time.""" - mock_time.return_value = 1000 - - mock_response = Mock() - mock_response.headers = { - "X-RateLimit-Remaining": "0", - "X-RateLimit-Reset": "1120", # 120 seconds from now - } +# ============================================================================= +# TESTS FOR SETUP_LOGGING +# ============================================================================= - main.sleep_for_rate_limit(mock_response) - mock_sleep.assert_called_once_with(120) +def test_setup_logging(): + """Test that setup_logging configures logging correctly.""" + main.setup_logging() - @patch("time.time") - @patch("time.sleep") - def test_sleep_for_rate_limit_when_reset_already_passed( - self, mock_sleep, mock_time - ): - """Test that sleep_for_rate_limit doesn't sleep negative time.""" - mock_time.return_value = 2000 + root_logger = logging.getLogger() + assert root_logger.level == logging.INFO + assert len(root_logger.handlers) > 0 - mock_response = Mock() - mock_response.headers = { - "X-RateLimit-Remaining": "0", - "X-RateLimit-Reset": "1500", # Already passed - } + # Check that at least one handler is a StreamHandler + has_stream_handler = any( + isinstance(handler, logging.StreamHandler) + for handler in root_logger.handlers + ) + assert has_stream_handler - main.sleep_for_rate_limit(mock_response) - # Should sleep for 0 seconds (max of 0 and negative value) - mock_sleep.assert_called_once_with(0) - @patch("time.sleep") - def test_sleep_for_rate_limit_when_remaining_not_zero(self, mock_sleep): - """Test that sleep_for_rate_limit doesn't sleep when remaining > 0.""" - mock_response = Mock() - mock_response.headers = { - "X-RateLimit-Remaining": "5", - "X-RateLimit-Reset": "1500", - } +# ============================================================================= +# TESTS FOR SLEEP_FOR_RATE_LIMIT +# ============================================================================= - main.sleep_for_rate_limit(mock_response) - # Should not sleep when remaining > 0 - mock_sleep.assert_not_called() +@patch("time.time") +@patch("time.sleep") +def test_sleep_for_rate_limit_calculates_wait_time(mock_sleep, mock_time): + """Test that sleep_for_rate_limit calculates correct wait time.""" + mock_time.return_value = 1000 - @patch("time.sleep") - def test_sleep_for_rate_limit_with_missing_headers(self, mock_sleep): - """Test sleep_for_rate_limit with missing rate limit headers.""" - mock_response = Mock() - mock_response.headers = {} + mock_response = Mock() + mock_response.headers = { + "X-RateLimit-Remaining": "0", + "X-RateLimit-Reset": "1120", # 120 seconds from now + } - main.sleep_for_rate_limit(mock_response) + main.sleep_for_rate_limit(mock_response) - # Should not sleep when headers are missing (defaults to remaining=1) - mock_sleep.assert_not_called() + mock_sleep.assert_called_once_with(120) -class TestExtractPullRequests: - """Tests for extract_pull_requests function.""" +@patch("time.time") +@patch("time.sleep") +def test_sleep_for_rate_limit_when_reset_already_passed(mock_sleep, mock_time): + """Test that sleep_for_rate_limit doesn't sleep negative time.""" + mock_time.return_value = 2000 - def test_extract_single_page(self, mock_session): - """Test extracting data from a single page of results.""" - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [ - {"number": 1, "title": "PR 1"}, - {"number": 2, "title": "PR 2"}, - ] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - # Mock the extract functions - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert len(result) == 1 - assert len(result[0]) == 2 - assert result[0][0]["number"] == 1 - assert result[0][1]["number"] == 2 - - def test_extract_multiple_pages(self, mock_session): - """Test extracting data across multiple pages with pagination.""" - # First page response - mock_response_1 = Mock() - mock_response_1.status_code = 200 - mock_response_1.json.return_value = [ - {"number": 1, "title": "PR 1"}, - {"number": 2, "title": "PR 2"}, - ] - mock_response_1.links = { - "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} - } + mock_response = Mock() + mock_response.headers = { + "X-RateLimit-Remaining": "0", + "X-RateLimit-Reset": "1500", # Already passed + } - # Second page response - mock_response_2 = Mock() - mock_response_2.status_code = 200 - mock_response_2.json.return_value = [{"number": 3, "title": "PR 3"}] - mock_response_2.links = {} - - mock_session.get.side_effect = [mock_response_1, mock_response_2] - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert len(result) == 2 - assert len(result[0]) == 2 - assert len(result[1]) == 1 - assert result[0][0]["number"] == 1 - assert result[1][0]["number"] == 3 - - def test_enriches_prs_with_commit_data(self, mock_session): - """Test that PRs are enriched with commit data.""" - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - mock_commits = [{"sha": "abc123"}] - - with ( - patch( - "main.extract_commits", return_value=mock_commits - ) as mock_extract_commits, - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert result[0][0]["commit_data"] == mock_commits - mock_extract_commits.assert_called_once() - - def test_enriches_prs_with_reviewer_data(self, mock_session): - """Test that PRs are enriched with reviewer data.""" - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - mock_reviewers = [{"id": 789, "state": "APPROVED"}] - - with ( - patch("main.extract_commits", return_value=[]), - patch( - "main.extract_reviewers", return_value=mock_reviewers - ) as mock_extract_reviewers, - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert result[0][0]["reviewer_data"] == mock_reviewers - mock_extract_reviewers.assert_called_once() - - def test_enriches_prs_with_comment_data(self, mock_session): - """Test that PRs are enriched with comment data.""" - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - mock_comments = [{"id": 456, "body": "Great work!"}] - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch( - "main.extract_comments", return_value=mock_comments - ) as mock_extract_comments, - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert result[0][0]["comment_data"] == mock_comments - mock_extract_comments.assert_called_once() - - @patch("main.sleep_for_rate_limit") - def test_handles_rate_limit(self, mock_sleep, mock_session): - """Test that extract_pull_requests handles rate limiting correctly.""" - # Rate limit response - mock_response_rate_limit = Mock() - mock_response_rate_limit.status_code = 403 - mock_response_rate_limit.headers = {"X-RateLimit-Remaining": "0"} - - # Successful response after rate limit - mock_response_success = Mock() - mock_response_success.status_code = 200 - mock_response_success.json.return_value = [{"number": 1, "title": "PR 1"}] - mock_response_success.links = {} - - mock_session.get.side_effect = [ - mock_response_rate_limit, - mock_response_success, - ] + main.sleep_for_rate_limit(mock_response) - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + # Should sleep for 0 seconds (max of 0 and negative value) + mock_sleep.assert_called_once_with(0) - mock_sleep.assert_called_once_with(mock_response_rate_limit) - assert len(result) == 1 - def test_handles_api_error_404(self, mock_session): - """Test that extract_pull_requests raises SystemExit on 404.""" - mock_response = Mock() - mock_response.status_code = 404 - mock_response.text = "Not Found" +@patch("time.sleep") +def test_sleep_for_rate_limit_when_remaining_not_zero(mock_sleep): + """Test that sleep_for_rate_limit doesn't sleep when remaining > 0.""" + mock_response = Mock() + mock_response.headers = { + "X-RateLimit-Remaining": "5", + "X-RateLimit-Reset": "1500", + } - mock_session.get.return_value = mock_response + main.sleep_for_rate_limit(mock_response) - with pytest.raises(SystemExit) as exc_info: - list(main.extract_pull_requests(mock_session, "mozilla/nonexistent")) + # Should not sleep when remaining > 0 + mock_sleep.assert_not_called() - assert "GitHub API error 404" in str(exc_info.value) - def test_handles_api_error_500(self, mock_session): - """Test that extract_pull_requests raises SystemExit on 500.""" - mock_response = Mock() - mock_response.status_code = 500 - mock_response.text = "Internal Server Error" +@patch("time.sleep") +def test_sleep_for_rate_limit_with_missing_headers(mock_sleep): + """Test sleep_for_rate_limit with missing rate limit headers.""" + mock_response = Mock() + mock_response.headers = {} - mock_session.get.return_value = mock_response + main.sleep_for_rate_limit(mock_response) - with pytest.raises(SystemExit) as exc_info: - list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert "GitHub API error 500" in str(exc_info.value) - - def test_stops_on_empty_batch(self, mock_session): - """Test that extraction stops when an empty batch is returned.""" - # First page with data - mock_response_1 = Mock() - mock_response_1.status_code = 200 - mock_response_1.json.return_value = [{"number": 1}] - mock_response_1.links = { - "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} - } + # Should not sleep when headers are missing (defaults to remaining=1) + mock_sleep.assert_not_called() - # Second page empty - mock_response_2 = Mock() - mock_response_2.status_code = 200 - mock_response_2.json.return_value = [] - mock_response_2.links = {} - - mock_session.get.side_effect = [mock_response_1, mock_response_2] - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - # Should only have 1 chunk from first page - assert len(result) == 1 - assert len(result[0]) == 1 - - def test_invalid_page_number_handling(self, mock_session): - """Test handling of invalid page number in pagination.""" - mock_response_1 = Mock() - mock_response_1.status_code = 200 - mock_response_1.json.return_value = [{"number": 1}] - mock_response_1.links = { - "next": { - "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=invalid" - } - } - mock_session.get.return_value = mock_response_1 - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - # Should stop pagination on invalid page number - assert len(result) == 1 - - def test_custom_github_api_url(self, mock_session): - """Test using custom GitHub API URL.""" - custom_url = "https://mock-github.example.com" - - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [{"number": 1}] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - list( - main.extract_pull_requests( - mock_session, "mozilla/firefox", github_api_url=custom_url - ) - ) - # Verify custom URL was used - call_args = mock_session.get.call_args - assert custom_url in call_args[0][0] - - def test_skips_prs_without_number_field(self, mock_session): - """Test that PRs without 'number' field are skipped.""" - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [ - {"number": 1, "title": "PR 1"}, - {"title": "PR without number"}, # Missing number field - {"number": 2, "title": "PR 2"}, - ] - mock_response.links = {} +# ============================================================================= +# TESTS FOR EXTRACT_PULL_REQUESTS +# ============================================================================= - mock_session.get.return_value = mock_response - with ( - patch("main.extract_commits", return_value=[]) as mock_commits, - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - list(main.extract_pull_requests(mock_session, "mozilla/firefox")) +def test_extract_pull_requests_basic(mock_session): + """Test basic extraction of pull requests.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [ + {"number": 1, "title": "PR 1"}, + {"number": 2, "title": "PR 2"}, + ] + mock_response.links = {} - # extract_commits should only be called for PRs with number field - assert mock_commits.call_count == 2 + mock_session.get.return_value = mock_response + # Mock the extract functions + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + assert len(result) == 1 + assert len(result[0]) == 2 + assert result[0][0]["number"] == 1 + assert result[0][1]["number"] == 2 + +def test_extract_multiple_pages(mock_session): + """Test extracting data across multiple pages with pagination.""" + # First page response + mock_response_1 = Mock() + mock_response_1.status_code = 200 + mock_response_1.json.return_value = [ + {"number": 1, "title": "PR 1"}, + {"number": 2, "title": "PR 2"}, + ] + mock_response_1.links = { + "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} + } -class TestExtractCommits: - """Tests for extract_commits function.""" + # Second page response + mock_response_2 = Mock() + mock_response_2.status_code = 200 + mock_response_2.json.return_value = [{"number": 3, "title": "PR 3"}] + mock_response_2.links = {} - def test_fetch_commits_with_files(self, mock_session): - """Test fetching commits with files for a PR.""" - # Mock commits list response - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [ - {"sha": "abc123"}, - {"sha": "def456"}, - ] + mock_session.get.side_effect = [mock_response_1, mock_response_2] - # Mock individual commit responses - commit_detail_1 = Mock() - commit_detail_1.status_code = 200 - commit_detail_1.json.return_value = { - "sha": "abc123", - "files": [{"filename": "file1.py", "additions": 10}], - } + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + assert len(result) == 2 + assert len(result[0]) == 2 + assert len(result[1]) == 1 + assert result[0][0]["number"] == 1 + assert result[1][0]["number"] == 3 + +def test_enriches_prs_with_commit_data(mock_session): + """Test that PRs are enriched with commit data.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + mock_commits = [{"sha": "abc123"}] + + with ( + patch( + "main.extract_commits", return_value=mock_commits + ) as mock_extract_commits, + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - commit_detail_2 = Mock() - commit_detail_2.status_code = 200 - commit_detail_2.json.return_value = { - "sha": "def456", - "files": [{"filename": "file2.py", "deletions": 5}], - } + assert result[0][0]["commit_data"] == mock_commits + mock_extract_commits.assert_called_once() - mock_session.get.side_effect = [ - commits_response, - commit_detail_1, - commit_detail_2, - ] +def test_enriches_prs_with_reviewer_data(mock_session): + """Test that PRs are enriched with reviewer data.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response.links = {} - result = main.extract_commits(mock_session, "mozilla/firefox", 123) - - assert len(result) == 2 - assert result[0]["sha"] == "abc123" - assert result[0]["files"][0]["filename"] == "file1.py" - assert result[1]["sha"] == "def456" - assert result[1]["files"][0]["filename"] == "file2.py" - - def test_multiple_files_per_commit(self, mock_session): - """Test handling multiple files in a single commit.""" - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [{"sha": "abc123"}] - - commit_detail = Mock() - commit_detail.status_code = 200 - commit_detail.json.return_value = { - "sha": "abc123", - "files": [ - {"filename": "file1.py", "additions": 10}, - {"filename": "file2.py", "additions": 20}, - {"filename": "file3.py", "deletions": 5}, - ], - } + mock_session.get.return_value = mock_response - mock_session.get.side_effect = [commits_response, commit_detail] + mock_reviewers = [{"id": 789, "state": "APPROVED"}] - result = main.extract_commits(mock_session, "mozilla/firefox", 123) + with ( + patch("main.extract_commits", return_value=[]), + patch( + "main.extract_reviewers", return_value=mock_reviewers + ) as mock_extract_reviewers, + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - assert len(result) == 1 - assert len(result[0]["files"]) == 3 + assert result[0][0]["reviewer_data"] == mock_reviewers + mock_extract_reviewers.assert_called_once() - @patch("main.sleep_for_rate_limit") - def test_rate_limit_on_commits_list(self, mock_sleep, mock_session): - """Test rate limit handling when fetching commits list.""" - # Rate limit response - rate_limit_response = Mock() - rate_limit_response.status_code = 403 - rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} +def test_enriches_prs_with_comment_data(mock_session): + """Test that PRs are enriched with comment data.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response.links = {} - # Success response - success_response = Mock() - success_response.status_code = 200 - success_response.json.return_value = [] + mock_session.get.return_value = mock_response - mock_session.get.side_effect = [rate_limit_response, success_response] + mock_comments = [{"id": 456, "body": "Great work!"}] - result = main.extract_commits(mock_session, "mozilla/firefox", 123) + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch( + "main.extract_comments", return_value=mock_comments + ) as mock_extract_comments, + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + assert result[0][0]["comment_data"] == mock_comments + mock_extract_comments.assert_called_once() + +@patch("main.sleep_for_rate_limit") +def test_handles_rate_limit(mock_sleep, mock_session): + """Test that extract_pull_requests handles rate limiting correctly.""" + # Rate limit response + mock_response_rate_limit = Mock() + mock_response_rate_limit.status_code = 403 + mock_response_rate_limit.headers = {"X-RateLimit-Remaining": "0"} + + # Successful response after rate limit + mock_response_success = Mock() + mock_response_success.status_code = 200 + mock_response_success.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response_success.links = {} + + mock_session.get.side_effect = [ + mock_response_rate_limit, + mock_response_success, + ] + + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - mock_sleep.assert_called_once() - assert result == [] + mock_sleep.assert_called_once_with(mock_response_rate_limit) + assert len(result) == 1 - def test_api_error_on_commits_list(self, mock_session): - """Test API error handling when fetching commits list.""" - error_response = Mock() - error_response.status_code = 500 - error_response.text = "Internal Server Error" +def test_handles_api_error_404(mock_session): + """Test that extract_pull_requests raises SystemExit on 404.""" + mock_response = Mock() + mock_response.status_code = 404 + mock_response.text = "Not Found" - mock_session.get.return_value = error_response + mock_session.get.return_value = mock_response - with pytest.raises(SystemExit) as exc_info: - main.extract_commits(mock_session, "mozilla/firefox", 123) + with pytest.raises(SystemExit) as exc_info: + list(main.extract_pull_requests(mock_session, "mozilla/nonexistent")) - assert "GitHub API error 500" in str(exc_info.value) + assert "GitHub API error 404" in str(exc_info.value) - def test_api_error_on_individual_commit(self, mock_session): - """Test API error when fetching individual commit details.""" - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [{"sha": "abc123"}] +def test_handles_api_error_500(mock_session): + """Test that extract_pull_requests raises SystemExit on 500.""" + mock_response = Mock() + mock_response.status_code = 500 + mock_response.text = "Internal Server Error" - commit_error = Mock() - commit_error.status_code = 404 - commit_error.text = "Commit not found" + mock_session.get.return_value = mock_response - mock_session.get.side_effect = [commits_response, commit_error] + with pytest.raises(SystemExit) as exc_info: + list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - with pytest.raises(SystemExit) as exc_info: - main.extract_commits(mock_session, "mozilla/firefox", 123) + assert "GitHub API error 500" in str(exc_info.value) - assert "GitHub API error 404" in str(exc_info.value) +def test_stops_on_empty_batch(mock_session): + """Test that extraction stops when an empty batch is returned.""" + # First page with data + mock_response_1 = Mock() + mock_response_1.status_code = 200 + mock_response_1.json.return_value = [{"number": 1}] + mock_response_1.links = { + "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} + } - def test_commit_without_sha_field(self, mock_session): - """Test handling commits without sha field.""" - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [ - {"sha": "abc123"}, - {}, # Missing sha field - ] + # Second page empty + mock_response_2 = Mock() + mock_response_2.status_code = 200 + mock_response_2.json.return_value = [] + mock_response_2.links = {} - commit_detail_1 = Mock() - commit_detail_1.status_code = 200 - commit_detail_1.json.return_value = {"sha": "abc123", "files": []} + mock_session.get.side_effect = [mock_response_1, mock_response_2] - commit_detail_2 = Mock() - commit_detail_2.status_code = 200 - commit_detail_2.json.return_value = {"files": []} + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + # Should only have 1 chunk from first page + assert len(result) == 1 + assert len(result[0]) == 1 + +def test_invalid_page_number_handling(mock_session): + """Test handling of invalid page number in pagination.""" + mock_response_1 = Mock() + mock_response_1.status_code = 200 + mock_response_1.json.return_value = [{"number": 1}] + mock_response_1.links = { + "next": { + "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=invalid" + } + } - mock_session.get.side_effect = [ - commits_response, - commit_detail_1, - commit_detail_2, - ] + mock_session.get.return_value = mock_response_1 - result = main.extract_commits(mock_session, "mozilla/firefox", 123) + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - # Should handle the commit without sha gracefully - assert len(result) == 2 + # Should stop pagination on invalid page number + assert len(result) == 1 - def test_custom_github_api_url(self, mock_session): - """Test using custom GitHub API URL for commits.""" - custom_url = "https://mock-github.example.com" +def test_custom_github_api_url(mock_session): + """Test using custom GitHub API URL.""" + custom_url = "https://mock-github.example.com" - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [] + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1}] + mock_response.links = {} - mock_session.get.return_value = commits_response + mock_session.get.return_value = mock_response - main.extract_commits( - mock_session, "mozilla/firefox", 123, github_api_url=custom_url + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + list( + main.extract_pull_requests( + mock_session, "mozilla/firefox", github_api_url=custom_url + ) ) - call_args = mock_session.get.call_args - assert custom_url in call_args[0][0] - - def test_empty_commits_list(self, mock_session): - """Test handling PR with no commits.""" - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [] - - mock_session.get.return_value = commits_response - - result = main.extract_commits(mock_session, "mozilla/firefox", 123) - - assert result == [] - - -class TestExtractReviewers: - """Tests for extract_reviewers function.""" + # Verify custom URL was used + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] + +def test_skips_prs_without_number_field(mock_session): + """Test that PRs without 'number' field are skipped.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [ + {"number": 1, "title": "PR 1"}, + {"title": "PR without number"}, # Missing number field + {"number": 2, "title": "PR 2"}, + ] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + with ( + patch("main.extract_commits", return_value=[]) as mock_commits, + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - def test_fetch_reviewers(self, mock_session): - """Test fetching reviewers for a PR.""" - reviewers_response = Mock() - reviewers_response.status_code = 200 - reviewers_response.json.return_value = [ - { - "id": 789, - "user": {"login": "reviewer1"}, - "state": "APPROVED", - "submitted_at": "2024-01-01T15:00:00Z", - }, - { - "id": 790, - "user": {"login": "reviewer2"}, - "state": "CHANGES_REQUESTED", - "submitted_at": "2024-01-01T16:00:00Z", - }, - ] + # extract_commits should only be called for PRs with number field + assert mock_commits.call_count == 2 - mock_session.get.return_value = reviewers_response - result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) - assert len(result) == 2 - assert result[0]["state"] == "APPROVED" - assert result[1]["state"] == "CHANGES_REQUESTED" +# ============================================================================= +# TESTS FOR EXTRACT_COMMITS +# ============================================================================= - def test_multiple_review_states(self, mock_session): - """Test handling multiple different review states.""" - reviewers_response = Mock() - reviewers_response.status_code = 200 - reviewers_response.json.return_value = [ - {"id": 1, "state": "APPROVED", "user": {"login": "user1"}}, - {"id": 2, "state": "CHANGES_REQUESTED", "user": {"login": "user2"}}, - {"id": 3, "state": "COMMENTED", "user": {"login": "user3"}}, - {"id": 4, "state": "DISMISSED", "user": {"login": "user4"}}, - ] + # Mock commits list response + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [ + {"sha": "abc123"}, + {"sha": "def456"}, + ] + + # Mock individual commit responses + commit_detail_1 = Mock() + commit_detail_1.status_code = 200 + commit_detail_1.json.return_value = { + "sha": "abc123", + "files": [{"filename": "file1.py", "additions": 10}], + } - mock_session.get.return_value = reviewers_response + commit_detail_2 = Mock() + commit_detail_2.status_code = 200 + commit_detail_2.json.return_value = { + "sha": "def456", + "files": [{"filename": "file2.py", "deletions": 5}], + } - result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) + mock_session.get.side_effect = [ + commits_response, + commit_detail_1, + commit_detail_2, + ] + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert len(result) == 2 + assert result[0]["sha"] == "abc123" + assert result[0]["files"][0]["filename"] == "file1.py" + assert result[1]["sha"] == "def456" + assert result[1]["files"][0]["filename"] == "file2.py" + +def test_multiple_files_per_commit(mock_session): + """Test handling multiple files in a single commit.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [{"sha": "abc123"}] + + commit_detail = Mock() + commit_detail.status_code = 200 + commit_detail.json.return_value = { + "sha": "abc123", + "files": [ + {"filename": "file1.py", "additions": 10}, + {"filename": "file2.py", "additions": 20}, + {"filename": "file3.py", "deletions": 5}, + ], + } - assert len(result) == 4 - states = [r["state"] for r in result] - assert "APPROVED" in states - assert "CHANGES_REQUESTED" in states - assert "COMMENTED" in states + mock_session.get.side_effect = [commits_response, commit_detail] - def test_empty_reviewers_list(self, mock_session): - """Test handling PR with no reviewers.""" - reviewers_response = Mock() - reviewers_response.status_code = 200 - reviewers_response.json.return_value = [] + result = main.extract_commits(mock_session, "mozilla/firefox", 123) - mock_session.get.return_value = reviewers_response + assert len(result) == 1 + assert len(result[0]["files"]) == 3 - result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) +@patch("main.sleep_for_rate_limit") +def test_rate_limit_on_commits_list(mock_sleep, mock_session): + """Test rate limit handling when fetching commits list.""" + # Rate limit response + rate_limit_response = Mock() + rate_limit_response.status_code = 403 + rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} - assert result == [] + # Success response + success_response = Mock() + success_response.status_code = 200 + success_response.json.return_value = [] - @patch("main.sleep_for_rate_limit") - def test_rate_limit_handling(self, mock_sleep, mock_session): - """Test rate limit handling when fetching reviewers.""" - rate_limit_response = Mock() - rate_limit_response.status_code = 403 - rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} + mock_session.get.side_effect = [rate_limit_response, success_response] - success_response = Mock() - success_response.status_code = 200 - success_response.json.return_value = [] + result = main.extract_commits(mock_session, "mozilla/firefox", 123) - mock_session.get.side_effect = [rate_limit_response, success_response] + mock_sleep.assert_called_once() + assert result == [] - result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) +def test_api_error_on_commits_list(mock_session): + """Test API error handling when fetching commits list.""" + error_response = Mock() + error_response.status_code = 500 + error_response.text = "Internal Server Error" - mock_sleep.assert_called_once() - assert result == [] + mock_session.get.return_value = error_response - def test_api_error(self, mock_session): - """Test API error handling when fetching reviewers.""" - error_response = Mock() - error_response.status_code = 500 - error_response.text = "Internal Server Error" + with pytest.raises(SystemExit) as exc_info: + main.extract_commits(mock_session, "mozilla/firefox", 123) - mock_session.get.return_value = error_response + assert "GitHub API error 500" in str(exc_info.value) - with pytest.raises(SystemExit) as exc_info: - main.extract_reviewers(mock_session, "mozilla/firefox", 123) +def test_api_error_on_individual_commit(mock_session): + """Test API error when fetching individual commit details.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [{"sha": "abc123"}] - assert "GitHub API error 500" in str(exc_info.value) + commit_error = Mock() + commit_error.status_code = 404 + commit_error.text = "Commit not found" - def test_custom_github_api_url(self, mock_session): - """Test using custom GitHub API URL for reviewers.""" - custom_url = "https://mock-github.example.com" + mock_session.get.side_effect = [commits_response, commit_error] - reviewers_response = Mock() - reviewers_response.status_code = 200 - reviewers_response.json.return_value = [] + with pytest.raises(SystemExit) as exc_info: + main.extract_commits(mock_session, "mozilla/firefox", 123) - mock_session.get.return_value = reviewers_response + assert "GitHub API error 404" in str(exc_info.value) - main.extract_reviewers( - mock_session, "mozilla/firefox", 123, github_api_url=custom_url - ) +def test_commit_without_sha_field(mock_session): + """Test handling commits without sha field.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [ + {"sha": "abc123"}, + {}, # Missing sha field + ] - call_args = mock_session.get.call_args - assert custom_url in call_args[0][0] + commit_detail_1 = Mock() + commit_detail_1.status_code = 200 + commit_detail_1.json.return_value = {"sha": "abc123", "files": []} + commit_detail_2 = Mock() + commit_detail_2.status_code = 200 + commit_detail_2.json.return_value = {"files": []} -class TestExtractComments: - """Tests for extract_comments function.""" + mock_session.get.side_effect = [ + commits_response, + commit_detail_1, + commit_detail_2, + ] - def test_fetch_comments(self, mock_session): - """Test fetching comments for a PR.""" - comments_response = Mock() - comments_response.status_code = 200 - comments_response.json.return_value = [ - { - "id": 456, - "user": {"login": "commenter1"}, - "body": "This looks good", - "created_at": "2024-01-01T14:00:00Z", - }, - { - "id": 457, - "user": {"login": "commenter2"}, - "body": "I have concerns", - "created_at": "2024-01-01T15:00:00Z", - }, - ] + result = main.extract_commits(mock_session, "mozilla/firefox", 123) - mock_session.get.return_value = comments_response + # Should handle the commit without sha gracefully + assert len(result) == 2 - result = main.extract_comments(mock_session, "mozilla/firefox", 123) +def test_custom_github_api_url(mock_session): + """Test using custom GitHub API URL for commits.""" + custom_url = "https://mock-github.example.com" - assert len(result) == 2 - assert result[0]["id"] == 456 - assert result[1]["id"] == 457 + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [] - def test_uses_issues_endpoint(self, mock_session): - """Test that comments use /issues endpoint not /pulls.""" - comments_response = Mock() - comments_response.status_code = 200 - comments_response.json.return_value = [] + mock_session.get.return_value = commits_response - mock_session.get.return_value = comments_response + main.extract_commits( + mock_session, "mozilla/firefox", 123, github_api_url=custom_url + ) - main.extract_comments(mock_session, "mozilla/firefox", 123) + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] - call_args = mock_session.get.call_args - url = call_args[0][0] - assert "/issues/123/comments" in url - assert "/pulls/123/comments" not in url - - def test_multiple_comments(self, mock_session): - """Test handling multiple comments.""" - comments_response = Mock() - comments_response.status_code = 200 - comments_response.json.return_value = [ - {"id": i, "user": {"login": f"user{i}"}, "body": f"Comment {i}"} - for i in range(1, 11) - ] +def test_empty_commits_list(mock_session): + """Test handling PR with no commits.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [] - mock_session.get.return_value = comments_response + mock_session.get.return_value = commits_response - result = main.extract_comments(mock_session, "mozilla/firefox", 123) + result = main.extract_commits(mock_session, "mozilla/firefox", 123) - assert len(result) == 10 + assert result == [] - def test_empty_comments_list(self, mock_session): - """Test handling PR with no comments.""" - comments_response = Mock() - comments_response.status_code = 200 - comments_response.json.return_value = [] - mock_session.get.return_value = comments_response - result = main.extract_comments(mock_session, "mozilla/firefox", 123) +# ============================================================================= +# TESTS FOR EXTRACT_REVIEWERS +# ============================================================================= - assert result == [] + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [ + { + "id": 789, + "user": {"login": "reviewer1"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T15:00:00Z", + }, + { + "id": 790, + "user": {"login": "reviewer2"}, + "state": "CHANGES_REQUESTED", + "submitted_at": "2024-01-01T16:00:00Z", + }, + ] - @patch("main.sleep_for_rate_limit") - def test_rate_limit_handling(self, mock_sleep, mock_session): - """Test rate limit handling when fetching comments.""" - rate_limit_response = Mock() - rate_limit_response.status_code = 403 - rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} + mock_session.get.return_value = reviewers_response - success_response = Mock() - success_response.status_code = 200 - success_response.json.return_value = [] + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) - mock_session.get.side_effect = [rate_limit_response, success_response] + assert len(result) == 2 + assert result[0]["state"] == "APPROVED" + assert result[1]["state"] == "CHANGES_REQUESTED" - result = main.extract_comments(mock_session, "mozilla/firefox", 123) +def test_multiple_review_states(mock_session): + """Test handling multiple different review states.""" + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [ + {"id": 1, "state": "APPROVED", "user": {"login": "user1"}}, + {"id": 2, "state": "CHANGES_REQUESTED", "user": {"login": "user2"}}, + {"id": 3, "state": "COMMENTED", "user": {"login": "user3"}}, + {"id": 4, "state": "DISMISSED", "user": {"login": "user4"}}, + ] - mock_sleep.assert_called_once() - assert result == [] + mock_session.get.return_value = reviewers_response - def test_api_error(self, mock_session): - """Test API error handling when fetching comments.""" - error_response = Mock() - error_response.status_code = 404 - error_response.text = "Not Found" + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) - mock_session.get.return_value = error_response + assert len(result) == 4 + states = [r["state"] for r in result] + assert "APPROVED" in states + assert "CHANGES_REQUESTED" in states + assert "COMMENTED" in states - with pytest.raises(SystemExit) as exc_info: - main.extract_comments(mock_session, "mozilla/firefox", 123) +def test_empty_reviewers_list(mock_session): + """Test handling PR with no reviewers.""" + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [] - assert "GitHub API error 404" in str(exc_info.value) + mock_session.get.return_value = reviewers_response - def test_custom_github_api_url(self, mock_session): - """Test using custom GitHub API URL for comments.""" - custom_url = "https://mock-github.example.com" + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) - comments_response = Mock() - comments_response.status_code = 200 - comments_response.json.return_value = [] + assert result == [] - mock_session.get.return_value = comments_response +@patch("main.sleep_for_rate_limit") +def test_rate_limit_handling(mock_sleep, mock_session): + """Test rate limit handling when fetching reviewers.""" + rate_limit_response = Mock() + rate_limit_response.status_code = 403 + rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} - main.extract_comments( - mock_session, "mozilla/firefox", 123, github_api_url=custom_url - ) + success_response = Mock() + success_response.status_code = 200 + success_response.json.return_value = [] - call_args = mock_session.get.call_args - assert custom_url in call_args[0][0] + mock_session.get.side_effect = [rate_limit_response, success_response] + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) -class TestTransformData: - """Tests for transform_data function.""" + mock_sleep.assert_called_once() + assert result == [] - def test_basic_pr_transformation(self): - """Test basic pull request field mapping.""" - raw_data = [ - { - "number": 123, - "title": "Fix login bug", - "state": "closed", - "created_at": "2024-01-01T10:00:00Z", - "updated_at": "2024-01-02T10:00:00Z", - "merged_at": "2024-01-02T12:00:00Z", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] +def test_api_error(mock_session): + """Test API error handling when fetching reviewers.""" + error_response = Mock() + error_response.status_code = 500 + error_response.text = "Internal Server Error" - result = main.transform_data(raw_data, "mozilla/firefox") + mock_session.get.return_value = error_response - assert len(result["pull_requests"]) == 1 - pr = result["pull_requests"][0] - assert pr["pull_request_id"] == 123 - assert pr["current_status"] == "closed" - assert pr["date_created"] == "2024-01-01T10:00:00Z" - assert pr["date_modified"] == "2024-01-02T10:00:00Z" - assert pr["date_landed"] == "2024-01-02T12:00:00Z" - assert pr["target_repository"] == "mozilla/firefox" - - def test_bug_id_extraction_basic(self): - """Test bug ID extraction from PR title.""" - test_cases = [ - ("Bug 1234567 - Fix issue", 1234567), - ("bug 1234567: Update code", 1234567), - ("Fix for bug 7654321", 7654321), - ("b=9876543 - Change behavior", 9876543), - ] + with pytest.raises(SystemExit) as exc_info: + main.extract_reviewers(mock_session, "mozilla/firefox", 123) - for title, expected_bug_id in test_cases: - raw_data = [ - { - "number": 1, - "title": title, - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["bug_id"] == expected_bug_id - - def test_bug_id_extraction_with_hash(self): - """Test bug ID extraction with # symbol.""" - raw_data = [ - { - "number": 1, - "title": "Bug #1234567 - Fix issue", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] + assert "GitHub API error 500" in str(exc_info.value) - result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["bug_id"] == 1234567 +def test_custom_github_api_url(mock_session): + """Test using custom GitHub API URL for reviewers.""" + custom_url = "https://mock-github.example.com" - def test_bug_id_filter_large_numbers(self): - """Test that bug IDs >= 100000000 are filtered out.""" - raw_data = [ - { - "number": 1, - "title": "Bug 999999999 - Invalid bug ID", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [] - result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["bug_id"] is None + mock_session.get.return_value = reviewers_response - def test_bug_id_no_match(self): - """Test PR title with no bug ID.""" - raw_data = [ - { - "number": 1, - "title": "Update documentation", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] + main.extract_reviewers( + mock_session, "mozilla/firefox", 123, github_api_url=custom_url + ) - result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["bug_id"] is None + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] - def test_labels_extraction(self): - """Test labels array extraction.""" - raw_data = [ - { - "number": 1, - "title": "PR with labels", - "state": "open", - "labels": [ - {"name": "bug"}, - {"name": "priority-high"}, - {"name": "needs-review"}, - ], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - result = main.transform_data(raw_data, "mozilla/firefox") - labels = result["pull_requests"][0]["labels"] - assert len(labels) == 3 - assert "bug" in labels - assert "priority-high" in labels - assert "needs-review" in labels - - def test_labels_empty_list(self): - """Test handling empty labels list.""" - raw_data = [ - { - "number": 1, - "title": "PR without labels", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["labels"] == [] +# ============================================================================= +# TESTS FOR EXTRACT_COMMENTS +# ============================================================================= - def test_commit_transformation(self): - """Test commit fields mapping.""" - raw_data = [ - { - "number": 123, - "title": "PR with commits", - "state": "open", - "labels": [], - "commit_data": [ - { - "sha": "abc123", - "commit": { - "author": { - "name": "Test Author", - "date": "2024-01-01T12:00:00Z", - } - }, - "files": [ - { - "filename": "src/main.py", - "additions": 10, - "deletions": 5, - } - ], - } - ], - "reviewer_data": [], - "comment_data": [], - } - ] + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [ + { + "id": 456, + "user": {"login": "commenter1"}, + "body": "This looks good", + "created_at": "2024-01-01T14:00:00Z", + }, + { + "id": 457, + "user": {"login": "commenter2"}, + "body": "I have concerns", + "created_at": "2024-01-01T15:00:00Z", + }, + ] - result = main.transform_data(raw_data, "mozilla/firefox") + mock_session.get.return_value = comments_response - assert len(result["commits"]) == 1 - commit = result["commits"][0] - assert commit["pull_request_id"] == 123 - assert commit["target_repository"] == "mozilla/firefox" - assert commit["commit_sha"] == "abc123" - assert commit["date_created"] == "2024-01-01T12:00:00Z" - assert commit["author_username"] == "Test Author" - assert commit["filename"] == "src/main.py" - assert commit["lines_added"] == 10 - assert commit["lines_removed"] == 5 - - def test_commit_file_flattening(self): - """Test that each file becomes a separate row.""" - raw_data = [ - { - "number": 123, - "title": "PR with multiple files", - "state": "open", - "labels": [], - "commit_data": [ - { - "sha": "abc123", - "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, - "files": [ - {"filename": "file1.py", "additions": 10, "deletions": 5}, - {"filename": "file2.py", "additions": 20, "deletions": 2}, - {"filename": "file3.py", "additions": 5, "deletions": 15}, - ], - } - ], - "reviewer_data": [], - "comment_data": [], - } - ] + result = main.extract_comments(mock_session, "mozilla/firefox", 123) - result = main.transform_data(raw_data, "mozilla/firefox") + assert len(result) == 2 + assert result[0]["id"] == 456 + assert result[1]["id"] == 457 - # Should have 3 rows in commits table (one per file) - assert len(result["commits"]) == 3 - filenames = [c["filename"] for c in result["commits"]] - assert "file1.py" in filenames - assert "file2.py" in filenames - assert "file3.py" in filenames +def test_uses_issues_endpoint(mock_session): + """Test that comments use /issues endpoint not /pulls.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [] - def test_multiple_commits_with_files(self): - """Test multiple commits with multiple files per PR.""" - raw_data = [ - { - "number": 123, - "title": "PR with multiple commits", - "state": "open", - "labels": [], - "commit_data": [ - { - "sha": "commit1", - "commit": {"author": {"name": "Author1", "date": "2024-01-01"}}, - "files": [ - {"filename": "file1.py", "additions": 10, "deletions": 0} - ], - }, - { - "sha": "commit2", - "commit": {"author": {"name": "Author2", "date": "2024-01-02"}}, - "files": [ - {"filename": "file2.py", "additions": 5, "deletions": 2}, - {"filename": "file3.py", "additions": 8, "deletions": 3}, - ], - }, - ], - "reviewer_data": [], - "comment_data": [], - } - ] + mock_session.get.return_value = comments_response - result = main.transform_data(raw_data, "mozilla/firefox") + main.extract_comments(mock_session, "mozilla/firefox", 123) - # Should have 3 rows total (1 file from commit1, 2 files from commit2) - assert len(result["commits"]) == 3 - assert result["commits"][0]["commit_sha"] == "commit1" - assert result["commits"][1]["commit_sha"] == "commit2" - assert result["commits"][2]["commit_sha"] == "commit2" + call_args = mock_session.get.call_args + url = call_args[0][0] + assert "/issues/123/comments" in url + assert "/pulls/123/comments" not in url - def test_reviewer_transformation(self): - """Test reviewer fields mapping.""" - raw_data = [ - { - "number": 123, - "title": "PR with reviewers", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [ - { - "id": 789, - "user": {"login": "reviewer1"}, - "state": "APPROVED", - "submitted_at": "2024-01-01T15:00:00Z", - } - ], - "comment_data": [], - } - ] +def test_multiple_comments(mock_session): + """Test handling multiple comments.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [ + {"id": i, "user": {"login": f"user{i}"}, "body": f"Comment {i}"} + for i in range(1, 11) + ] - result = main.transform_data(raw_data, "mozilla/firefox") + mock_session.get.return_value = comments_response - assert len(result["reviewers"]) == 1 - reviewer = result["reviewers"][0] - assert reviewer["pull_request_id"] == 123 - assert reviewer["target_repository"] == "mozilla/firefox" - assert reviewer["reviewer_username"] == "reviewer1" - assert reviewer["status"] == "APPROVED" - assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z" + result = main.extract_comments(mock_session, "mozilla/firefox", 123) - def test_multiple_review_states(self): - """Test handling multiple review states.""" - raw_data = [ - { - "number": 123, - "title": "PR with multiple reviews", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [ - { - "id": 1, - "user": {"login": "user1"}, - "state": "APPROVED", - "submitted_at": "2024-01-01T15:00:00Z", - }, - { - "id": 2, - "user": {"login": "user2"}, - "state": "CHANGES_REQUESTED", - "submitted_at": "2024-01-01T16:00:00Z", - }, - { - "id": 3, - "user": {"login": "user3"}, - "state": "COMMENTED", - "submitted_at": "2024-01-01T17:00:00Z", - }, - ], - "comment_data": [], - } - ] + assert len(result) == 10 - result = main.transform_data(raw_data, "mozilla/firefox") +def test_empty_comments_list(mock_session): + """Test handling PR with no comments.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [] - assert len(result["reviewers"]) == 3 - states = [r["status"] for r in result["reviewers"]] - assert "APPROVED" in states - assert "CHANGES_REQUESTED" in states - assert "COMMENTED" in states + mock_session.get.return_value = comments_response - def test_date_approved_from_earliest_approval(self): - """Test that date_approved is set to earliest APPROVED review.""" - raw_data = [ - { - "number": 123, - "title": "PR with multiple approvals", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [ - { - "id": 1, - "user": {"login": "user1"}, - "state": "APPROVED", - "submitted_at": "2024-01-02T15:00:00Z", - }, - { - "id": 2, - "user": {"login": "user2"}, - "state": "APPROVED", - "submitted_at": "2024-01-01T14:00:00Z", # Earliest - }, - { - "id": 3, - "user": {"login": "user3"}, - "state": "APPROVED", - "submitted_at": "2024-01-03T16:00:00Z", - }, - ], - "comment_data": [], - } - ] + result = main.extract_comments(mock_session, "mozilla/firefox", 123) - result = main.transform_data(raw_data, "mozilla/firefox") + assert result == [] - pr = result["pull_requests"][0] - assert pr["date_approved"] == "2024-01-01T14:00:00Z" +@patch("main.sleep_for_rate_limit") +def test_rate_limit_handling(mock_sleep, mock_session): + """Test rate limit handling when fetching comments.""" + rate_limit_response = Mock() + rate_limit_response.status_code = 403 + rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} - def test_comment_transformation(self): - """Test comment fields mapping.""" - raw_data = [ - { - "number": 123, - "title": "PR with comments", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [ - { - "id": 456, - "user": {"login": "commenter1"}, - "body": "This looks great!", - "created_at": "2024-01-01T14:00:00Z", - "pull_request_review_id": None, - } - ], - } - ] + success_response = Mock() + success_response.status_code = 200 + success_response.json.return_value = [] - result = main.transform_data(raw_data, "mozilla/firefox") + mock_session.get.side_effect = [rate_limit_response, success_response] - assert len(result["comments"]) == 1 - comment = result["comments"][0] - assert comment["pull_request_id"] == 123 - assert comment["target_repository"] == "mozilla/firefox" - assert comment["comment_id"] == 456 - assert comment["author_username"] == "commenter1" - assert comment["date_created"] == "2024-01-01T14:00:00Z" - assert comment["character_count"] == 17 - - def test_comment_character_count(self): - """Test character count calculation for comments.""" - raw_data = [ - { - "number": 123, - "title": "PR", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [ - { - "id": 1, - "user": {"login": "user1"}, - "body": "Short", - "created_at": "2024-01-01", - }, - { - "id": 2, - "user": {"login": "user2"}, - "body": "This is a much longer comment with more text", - "created_at": "2024-01-01", - }, - ], - } - ] + result = main.extract_comments(mock_session, "mozilla/firefox", 123) - result = main.transform_data(raw_data, "mozilla/firefox") + mock_sleep.assert_called_once() + assert result == [] - assert result["comments"][0]["character_count"] == 5 - assert result["comments"][1]["character_count"] == 44 +def test_api_error(mock_session): + """Test API error handling when fetching comments.""" + error_response = Mock() + error_response.status_code = 404 + error_response.text = "Not Found" - def test_comment_status_from_review(self): - """Test that comment status is mapped from review_id_statuses.""" - raw_data = [ - { - "number": 123, - "title": "PR", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [ - { - "id": 789, - "user": {"login": "reviewer"}, - "state": "APPROVED", - "submitted_at": "2024-01-01", - } - ], - "comment_data": [ - { - "id": 456, - "user": {"login": "commenter"}, - "body": "LGTM", - "created_at": "2024-01-01", - "pull_request_review_id": 789, - } - ], - } - ] + mock_session.get.return_value = error_response - result = main.transform_data(raw_data, "mozilla/firefox") + with pytest.raises(SystemExit) as exc_info: + main.extract_comments(mock_session, "mozilla/firefox", 123) - # Comment should have status from the review - assert result["comments"][0]["status"] == "APPROVED" + assert "GitHub API error 404" in str(exc_info.value) - def test_comment_empty_body(self): - """Test handling comments with empty or None body.""" - raw_data = [ - { - "number": 123, - "title": "PR", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [ - { - "id": 1, - "user": {"login": "user1"}, - "body": None, - "created_at": "2024-01-01", - }, - { - "id": 2, - "user": {"login": "user2"}, - "body": "", - "created_at": "2024-01-01", - }, - ], - } - ] +def test_custom_github_api_url(mock_session): + """Test using custom GitHub API URL for comments.""" + custom_url = "https://mock-github.example.com" - result = main.transform_data(raw_data, "mozilla/firefox") + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [] - assert result["comments"][0]["character_count"] == 0 - assert result["comments"][1]["character_count"] == 0 + mock_session.get.return_value = comments_response - def test_empty_raw_data(self): - """Test handling empty input list.""" - result = main.transform_data([], "mozilla/firefox") + main.extract_comments( + mock_session, "mozilla/firefox", 123, github_api_url=custom_url + ) - assert result["pull_requests"] == [] - assert result["commits"] == [] - assert result["reviewers"] == [] - assert result["comments"] == [] + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] - def test_pr_without_commits_reviewers_comments(self): - """Test PR with no commits, reviewers, or comments.""" - raw_data = [ - { - "number": 123, - "title": "Minimal PR", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - result = main.transform_data(raw_data, "mozilla/firefox") - assert len(result["pull_requests"]) == 1 - assert len(result["commits"]) == 0 - assert len(result["reviewers"]) == 0 - assert len(result["comments"]) == 0 +# ============================================================================= +# TESTS FOR TRANSFORM_DATA +# ============================================================================= - def test_return_structure(self): - """Test that transform_data returns dict with 4 keys.""" + raw_data = [ + { + "number": 123, + "title": "Fix login bug", + "state": "closed", + "created_at": "2024-01-01T10:00:00Z", + "updated_at": "2024-01-02T10:00:00Z", + "merged_at": "2024-01-02T12:00:00Z", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["pull_requests"]) == 1 + pr = result["pull_requests"][0] + assert pr["pull_request_id"] == 123 + assert pr["current_status"] == "closed" + assert pr["date_created"] == "2024-01-01T10:00:00Z" + assert pr["date_modified"] == "2024-01-02T10:00:00Z" + assert pr["date_landed"] == "2024-01-02T12:00:00Z" + assert pr["target_repository"] == "mozilla/firefox" + +def test_bug_id_extraction_basic(): + """Test bug ID extraction from PR title.""" + test_cases = [ + ("Bug 1234567 - Fix issue", 1234567), + ("bug 1234567: Update code", 1234567), + ("Fix for bug 7654321", 7654321), + ("b=9876543 - Change behavior", 9876543), + ] + + for title, expected_bug_id in test_cases: raw_data = [ { "number": 1, - "title": "Test", + "title": title, "state": "open", "labels": [], "commit_data": [], @@ -1444,638 +958,835 @@ def test_return_structure(self): ] result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] == expected_bug_id + +def test_bug_id_extraction_with_hash(): + """Test bug ID extraction with # symbol.""" + raw_data = [ + { + "number": 1, + "title": "Bug #1234567 - Fix issue", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] == 1234567 + +def test_bug_id_filter_large_numbers(): + """Test that bug IDs >= 100000000 are filtered out.""" + raw_data = [ + { + "number": 1, + "title": "Bug 999999999 - Invalid bug ID", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] is None + +def test_bug_id_no_match(): + """Test PR title with no bug ID.""" + raw_data = [ + { + "number": 1, + "title": "Update documentation", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] is None + +def test_labels_extraction(): + """Test labels array extraction.""" + raw_data = [ + { + "number": 1, + "title": "PR with labels", + "state": "open", + "labels": [ + {"name": "bug"}, + {"name": "priority-high"}, + {"name": "needs-review"}, + ], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + labels = result["pull_requests"][0]["labels"] + assert len(labels) == 3 + assert "bug" in labels + assert "priority-high" in labels + assert "needs-review" in labels + +def test_labels_empty_list(): + """Test handling empty labels list.""" + raw_data = [ + { + "number": 1, + "title": "PR without labels", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["labels"] == [] + +def test_commit_transformation(): + """Test commit fields mapping.""" + raw_data = [ + { + "number": 123, + "title": "PR with commits", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "abc123", + "commit": { + "author": { + "name": "Test Author", + "date": "2024-01-01T12:00:00Z", + } + }, + "files": [ + { + "filename": "src/main.py", + "additions": 10, + "deletions": 5, + } + ], + } + ], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["commits"]) == 1 + commit = result["commits"][0] + assert commit["pull_request_id"] == 123 + assert commit["target_repository"] == "mozilla/firefox" + assert commit["commit_sha"] == "abc123" + assert commit["date_created"] == "2024-01-01T12:00:00Z" + assert commit["author_username"] == "Test Author" + assert commit["filename"] == "src/main.py" + assert commit["lines_added"] == 10 + assert commit["lines_removed"] == 5 + +def test_commit_file_flattening(): + """Test that each file becomes a separate row.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple files", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "abc123", + "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, + "files": [ + {"filename": "file1.py", "additions": 10, "deletions": 5}, + {"filename": "file2.py", "additions": 20, "deletions": 2}, + {"filename": "file3.py", "additions": 5, "deletions": 15}, + ], + } + ], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + # Should have 3 rows in commits table (one per file) + assert len(result["commits"]) == 3 + filenames = [c["filename"] for c in result["commits"]] + assert "file1.py" in filenames + assert "file2.py" in filenames + assert "file3.py" in filenames + +def test_multiple_commits_with_files(): + """Test multiple commits with multiple files per PR.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple commits", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "commit1", + "commit": {"author": {"name": "Author1", "date": "2024-01-01"}}, + "files": [ + {"filename": "file1.py", "additions": 10, "deletions": 0} + ], + }, + { + "sha": "commit2", + "commit": {"author": {"name": "Author2", "date": "2024-01-02"}}, + "files": [ + {"filename": "file2.py", "additions": 5, "deletions": 2}, + {"filename": "file3.py", "additions": 8, "deletions": 3}, + ], + }, + ], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + # Should have 3 rows total (1 file from commit1, 2 files from commit2) + assert len(result["commits"]) == 3 + assert result["commits"][0]["commit_sha"] == "commit1" + assert result["commits"][1]["commit_sha"] == "commit2" + assert result["commits"][2]["commit_sha"] == "commit2" + +def test_reviewer_transformation(): + """Test reviewer fields mapping.""" + raw_data = [ + { + "number": 123, + "title": "PR with reviewers", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 789, + "user": {"login": "reviewer1"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T15:00:00Z", + } + ], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["reviewers"]) == 1 + reviewer = result["reviewers"][0] + assert reviewer["pull_request_id"] == 123 + assert reviewer["target_repository"] == "mozilla/firefox" + assert reviewer["reviewer_username"] == "reviewer1" + assert reviewer["status"] == "APPROVED" + assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z" + +def test_multiple_review_states(): + """Test handling multiple review states.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple reviews", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T15:00:00Z", + }, + { + "id": 2, + "user": {"login": "user2"}, + "state": "CHANGES_REQUESTED", + "submitted_at": "2024-01-01T16:00:00Z", + }, + { + "id": 3, + "user": {"login": "user3"}, + "state": "COMMENTED", + "submitted_at": "2024-01-01T17:00:00Z", + }, + ], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["reviewers"]) == 3 + states = [r["status"] for r in result["reviewers"]] + assert "APPROVED" in states + assert "CHANGES_REQUESTED" in states + assert "COMMENTED" in states + +def test_date_approved_from_earliest_approval(): + """Test that date_approved is set to earliest APPROVED review.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple approvals", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "state": "APPROVED", + "submitted_at": "2024-01-02T15:00:00Z", + }, + { + "id": 2, + "user": {"login": "user2"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T14:00:00Z", # Earliest + }, + { + "id": 3, + "user": {"login": "user3"}, + "state": "APPROVED", + "submitted_at": "2024-01-03T16:00:00Z", + }, + ], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + pr = result["pull_requests"][0] + assert pr["date_approved"] == "2024-01-01T14:00:00Z" + +def test_comment_transformation(): + """Test comment fields mapping.""" + raw_data = [ + { + "number": 123, + "title": "PR with comments", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [ + { + "id": 456, + "user": {"login": "commenter1"}, + "body": "This looks great!", + "created_at": "2024-01-01T14:00:00Z", + "pull_request_review_id": None, + } + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["comments"]) == 1 + comment = result["comments"][0] + assert comment["pull_request_id"] == 123 + assert comment["target_repository"] == "mozilla/firefox" + assert comment["comment_id"] == 456 + assert comment["author_username"] == "commenter1" + assert comment["date_created"] == "2024-01-01T14:00:00Z" + assert comment["character_count"] == 17 + +def test_comment_character_count(): + """Test character count calculation for comments.""" + raw_data = [ + { + "number": 123, + "title": "PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "body": "Short", + "created_at": "2024-01-01", + }, + { + "id": 2, + "user": {"login": "user2"}, + "body": "This is a much longer comment with more text", + "created_at": "2024-01-01", + }, + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert result["comments"][0]["character_count"] == 5 + assert result["comments"][1]["character_count"] == 44 + +def test_comment_status_from_review(): + """Test that comment status is mapped from review_id_statuses.""" + raw_data = [ + { + "number": 123, + "title": "PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 789, + "user": {"login": "reviewer"}, + "state": "APPROVED", + "submitted_at": "2024-01-01", + } + ], + "comment_data": [ + { + "id": 456, + "user": {"login": "commenter"}, + "body": "LGTM", + "created_at": "2024-01-01", + "pull_request_review_id": 789, + } + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + # Comment should have status from the review + assert result["comments"][0]["status"] == "APPROVED" + +def test_comment_empty_body(): + """Test handling comments with empty or None body.""" + raw_data = [ + { + "number": 123, + "title": "PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "body": None, + "created_at": "2024-01-01", + }, + { + "id": 2, + "user": {"login": "user2"}, + "body": "", + "created_at": "2024-01-01", + }, + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert result["comments"][0]["character_count"] == 0 + assert result["comments"][1]["character_count"] == 0 + +def test_empty_raw_data(): + """Test handling empty input list.""" + result = main.transform_data([], "mozilla/firefox") + + assert result["pull_requests"] == [] + assert result["commits"] == [] + assert result["reviewers"] == [] + assert result["comments"] == [] + +def test_pr_without_commits_reviewers_comments(): + """Test PR with no commits, reviewers, or comments.""" + raw_data = [ + { + "number": 123, + "title": "Minimal PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["pull_requests"]) == 1 + assert len(result["commits"]) == 0 + assert len(result["reviewers"]) == 0 + assert len(result["comments"]) == 0 + +def test_return_structure(): + """Test that transform_data returns dict with 4 keys.""" + raw_data = [ + { + "number": 1, + "title": "Test", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert isinstance(result, dict) + assert "pull_requests" in result + assert "commits" in result + assert "reviewers" in result + assert "comments" in result + +def test_all_tables_have_target_repository(): + """Test that all tables include target_repository field.""" + raw_data = [ + { + "number": 123, + "title": "Test PR", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "abc", + "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, + "files": [ + {"filename": "test.py", "additions": 1, "deletions": 0} + ], + } + ], + "reviewer_data": [ + { + "id": 1, + "user": {"login": "reviewer"}, + "state": "APPROVED", + "submitted_at": "2024-01-01", + } + ], + "comment_data": [ + { + "id": 2, + "user": {"login": "commenter"}, + "body": "Test", + "created_at": "2024-01-01", + } + ], + } + ] - assert isinstance(result, dict) - assert "pull_requests" in result - assert "commits" in result - assert "reviewers" in result - assert "comments" in result - - def test_all_tables_have_target_repository(self): - """Test that all tables include target_repository field.""" - raw_data = [ - { - "number": 123, - "title": "Test PR", - "state": "open", - "labels": [], - "commit_data": [ - { - "sha": "abc", - "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, - "files": [ - {"filename": "test.py", "additions": 1, "deletions": 0} - ], - } - ], - "reviewer_data": [ - { - "id": 1, - "user": {"login": "reviewer"}, - "state": "APPROVED", - "submitted_at": "2024-01-01", - } - ], - "comment_data": [ - { - "id": 2, - "user": {"login": "commenter"}, - "body": "Test", - "created_at": "2024-01-01", - } - ], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") + result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["target_repository"] == "mozilla/firefox" - assert result["commits"][0]["target_repository"] == "mozilla/firefox" - assert result["reviewers"][0]["target_repository"] == "mozilla/firefox" - assert result["comments"][0]["target_repository"] == "mozilla/firefox" + assert result["pull_requests"][0]["target_repository"] == "mozilla/firefox" + assert result["commits"][0]["target_repository"] == "mozilla/firefox" + assert result["reviewers"][0]["target_repository"] == "mozilla/firefox" + assert result["comments"][0]["target_repository"] == "mozilla/firefox" -class TestLoadData: - """Tests for load_data function.""" - @patch("main.datetime") - def test_load_all_tables(self, mock_datetime, mock_bigquery_client): - """Test loading all 4 tables to BigQuery.""" - mock_datetime.now.return_value.strftime.return_value = "2024-01-15" +# ============================================================================= +# TESTS FOR LOAD_DATA +# ============================================================================= - transformed_data = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [{"commit_sha": "abc"}], - "reviewers": [{"reviewer_username": "user1"}], - "comments": [{"comment_id": 123}], - } - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) +@patch("main.datetime") +def test_load_data_inserts_all_tables(mock_datetime, mock_bigquery_client): + """Test that load_data inserts all tables correctly.""" + mock_datetime.now.return_value.strftime.return_value = "2024-01-15" - # Should call insert_rows_json 4 times (once per table) - assert mock_bigquery_client.insert_rows_json.call_count == 4 + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [{"commit_sha": "abc"}], + "reviewers": [{"reviewer_username": "user1"}], + "comments": [{"comment_id": 123}], + } - @patch("main.datetime") - def test_adds_snapshot_date(self, mock_datetime, mock_bigquery_client): - """Test that snapshot_date is added to all rows.""" - mock_datetime.now.return_value.strftime.return_value = "2024-01-15" + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - transformed_data = { - "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}], - "commits": [], - "reviewers": [], - "comments": [], - } + # Should call insert_rows_json 4 times (once per table) + assert mock_bigquery_client.insert_rows_json.call_count == 4 - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) +@patch("main.datetime") +def test_adds_snapshot_date(mock_datetime, mock_bigquery_client): + """Test that snapshot_date is added to all rows.""" + mock_datetime.now.return_value.strftime.return_value = "2024-01-15" - call_args = mock_bigquery_client.insert_rows_json.call_args - rows = call_args[0][1] - assert all(row["snapshot_date"] == "2024-01-15" for row in rows) - - def test_constructs_correct_table_ref(self, mock_bigquery_client): - """Test that table_ref is constructed correctly.""" - transformed_data = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } + transformed_data = { + "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}], + "commits": [], + "reviewers": [], + "comments": [], + } - main.load_data(mock_bigquery_client, "my_dataset", transformed_data) + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - call_args = mock_bigquery_client.insert_rows_json.call_args - table_ref = call_args[0][0] - assert table_ref == "test-project.my_dataset.pull_requests" + call_args = mock_bigquery_client.insert_rows_json.call_args + rows = call_args[0][1] + assert all(row["snapshot_date"] == "2024-01-15" for row in rows) - def test_empty_transformed_data_skipped(self, mock_bigquery_client): - """Test that empty transformed_data dict is skipped.""" - transformed_data = {} +def test_constructs_correct_table_ref(mock_bigquery_client): + """Test that table_ref is constructed correctly.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + main.load_data(mock_bigquery_client, "my_dataset", transformed_data) - mock_bigquery_client.insert_rows_json.assert_not_called() + call_args = mock_bigquery_client.insert_rows_json.call_args + table_ref = call_args[0][0] + assert table_ref == "test-project.my_dataset.pull_requests" - def test_skips_empty_tables_individually(self, mock_bigquery_client): - """Test that empty tables are skipped individually.""" - transformed_data = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], # Empty, should be skipped - "reviewers": [], # Empty, should be skipped - "comments": [{"comment_id": 456}], - } +def test_empty_transformed_data_skipped(mock_bigquery_client): + """Test that empty transformed_data dict is skipped.""" + transformed_data = {} - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - # Should only call insert_rows_json twice (for PRs and comments) - assert mock_bigquery_client.insert_rows_json.call_count == 2 + mock_bigquery_client.insert_rows_json.assert_not_called() - def test_only_pull_requests_table(self, mock_bigquery_client): - """Test loading only pull_requests table.""" - transformed_data = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } +def test_skips_empty_tables_individually(mock_bigquery_client): + """Test that empty tables are skipped individually.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], # Empty, should be skipped + "reviewers": [], # Empty, should be skipped + "comments": [{"comment_id": 456}], + } - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - assert mock_bigquery_client.insert_rows_json.call_count == 1 + # Should only call insert_rows_json twice (for PRs and comments) + assert mock_bigquery_client.insert_rows_json.call_count == 2 - def test_raises_exception_on_insert_errors(self, mock_bigquery_client): - """Test that Exception is raised on BigQuery insert errors.""" - mock_bigquery_client.insert_rows_json.return_value = [ - {"index": 0, "errors": ["Insert failed"]} - ] +def test_only_pull_requests_table(mock_bigquery_client): + """Test loading only pull_requests table.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } - transformed_data = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - with pytest.raises(Exception) as exc_info: - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + assert mock_bigquery_client.insert_rows_json.call_count == 1 - assert "BigQuery insert errors" in str(exc_info.value) +def test_raises_exception_on_insert_errors(mock_bigquery_client): + """Test that Exception is raised on BigQuery insert errors.""" + mock_bigquery_client.insert_rows_json.return_value = [ + {"index": 0, "errors": ["Insert failed"]} + ] - def test_verifies_client_insert_called_correctly(self, mock_bigquery_client): - """Test that client.insert_rows_json is called with correct arguments.""" - transformed_data = { - "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}], - "commits": [], - "reviewers": [], - "comments": [], - } + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } + with pytest.raises(Exception) as exc_info: main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - call_args = mock_bigquery_client.insert_rows_json.call_args - table_ref, rows = call_args[0] + assert "BigQuery insert errors" in str(exc_info.value) - assert "pull_requests" in table_ref - assert len(rows) == 2 +def test_verifies_client_insert_called_correctly(mock_bigquery_client): + """Test that client.insert_rows_json is called with correct arguments.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}], + "commits": [], + "reviewers": [], + "comments": [], + } + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) -class TestMain: - """Tests for main function.""" + call_args = mock_bigquery_client.insert_rows_json.call_args + table_ref, rows = call_args[0] - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_requires_github_repos( - self, mock_session_class, mock_bq_client, mock_setup_logging - ): - """Test that GITHUB_REPOS is required.""" - with patch.dict( - os.environ, - {"BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test"}, - clear=True, - ): - with pytest.raises(SystemExit) as exc_info: - main.main() + assert "pull_requests" in table_ref + assert len(rows) == 2 - assert "GITHUB_REPOS" in str(exc_info.value) - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_requires_bigquery_project( - self, mock_session_class, mock_bq_client, mock_setup_logging - ): - """Test that BIGQUERY_PROJECT is required.""" - with patch.dict( - os.environ, - {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"}, - clear=True, - ): - with pytest.raises(SystemExit) as exc_info: - main.main() - assert "BIGQUERY_PROJECT" in str(exc_info.value) - - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_requires_bigquery_dataset( - self, mock_session_class, mock_bq_client, mock_setup_logging - ): - """Test that BIGQUERY_DATASET is required.""" - with patch.dict( - os.environ, - {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"}, - clear=True, - ): - with pytest.raises(SystemExit) as exc_info: - main.main() +# ============================================================================= +# TESTS FOR MAIN +# ============================================================================= - assert "BIGQUERY_DATASET" in str(exc_info.value) - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_github_token_optional_with_warning( - self, mock_session_class, mock_bq_client, mock_setup_logging +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_requires_github_repos(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that GITHUB_REPOS is required.""" + with patch.dict( + os.environ, + {"BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test"}, + clear=True, ): - """Test that GITHUB_TOKEN is optional but warns if missing.""" - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])), - ): - # Should not raise, but should log warning - result = main.main() - assert result == 0 - - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_splits_github_repos_by_comma( - self, mock_session_class, mock_bq_client, mock_setup_logging - ): - """Test that GITHUB_REPOS is split by comma.""" - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])) as mock_extract, - ): + with pytest.raises(SystemExit) as exc_info: main.main() - # Should be called twice (once per repo) - assert mock_extract.call_count == 2 + assert "GITHUB_REPOS" in str(exc_info.value) - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_honors_github_api_url( - self, mock_session_class, mock_bq_client, mock_setup_logging - ): - """Test that GITHUB_API_URL is honored.""" - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - "GITHUB_API_URL": "https://custom-api.example.com", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])) as mock_extract, - ): - main.main() - - call_kwargs = mock_extract.call_args[1] - assert call_kwargs["github_api_url"] == "https://custom-api.example.com" - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_honors_bigquery_emulator_host( - self, mock_session_class, mock_bq_client_class, mock_setup_logging +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_requires_bigquery_project(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that BIGQUERY_PROJECT is required.""" + with patch.dict( + os.environ, + {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"}, + clear=True, ): - """Test that BIGQUERY_EMULATOR_HOST is honored.""" - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - "BIGQUERY_EMULATOR_HOST": "http://localhost:9050", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])), - ): + with pytest.raises(SystemExit) as exc_info: main.main() - # Verify BigQuery client was created with emulator settings - mock_bq_client_class.assert_called_once() - - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_creates_session_with_headers( - self, mock_session_class, mock_bq_client, mock_setup_logging - ): - """Test that session is created with Accept and User-Agent headers.""" - mock_session = MagicMock() - mock_session_class.return_value = mock_session + assert "BIGQUERY_PROJECT" in str(exc_info.value) - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])), - ): - main.main() - # Verify session headers were set - assert mock_session.headers.update.called - call_args = mock_session.headers.update.call_args[0][0] - assert "Accept" in call_args - assert "User-Agent" in call_args - - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_sets_authorization_header_with_token( - self, mock_session_class, mock_bq_client, mock_setup_logging +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_requires_bigquery_dataset(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that BIGQUERY_DATASET is required.""" + with patch.dict( + os.environ, + {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"}, + clear=True, ): - """Test that Authorization header is set when token provided.""" - mock_session = MagicMock() - mock_session_class.return_value = mock_session - - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "test-token-123", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])), - ): + with pytest.raises(SystemExit) as exc_info: main.main() - # Verify Authorization header was set - assert mock_session.headers.__setitem__.called - - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - @patch("main.extract_pull_requests") - @patch("main.transform_data") - @patch("main.load_data") - def test_single_repo_successful_etl( - self, - mock_load, - mock_transform, - mock_extract, - mock_session_class, - mock_bq_client, - mock_setup_logging, - ): - """Test successful ETL for single repository.""" - mock_extract.return_value = iter([[{"number": 1}]]) - mock_transform.return_value = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } + assert "BIGQUERY_DATASET" in str(exc_info.value) - with patch.dict( +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_github_token_optional_with_warning(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that GITHUB_TOKEN is optional but warns if missing.""" + with ( + patch.dict( os.environ, { "GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", }, clear=True, - ): - result = main.main() - - assert result == 0 - mock_extract.assert_called_once() - mock_transform.assert_called_once() - mock_load.assert_called_once() - - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - @patch("main.extract_pull_requests") - @patch("main.transform_data") - @patch("main.load_data") - def test_multiple_repos_processing( - self, - mock_load, - mock_transform, - mock_extract, - mock_session_class, - mock_bq_client, - mock_setup_logging, + ), + patch("main.extract_pull_requests", return_value=iter([])), ): - """Test processing multiple repositories.""" - mock_extract.return_value = iter([[{"number": 1}]]) - mock_transform.return_value = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } + # Should not raise, but should log warning + result = main.main() + assert result == 0 - with patch.dict( +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_splits_github_repos_by_comma(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that GITHUB_REPOS is split by comma.""" + with ( + patch.dict( os.environ, { - "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev,mozilla/addons", + "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev", "BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test", "GITHUB_TOKEN": "token", }, clear=True, - ): - result = main.main() - - assert result == 0 - # Should process 3 repositories - assert mock_extract.call_count == 3 - - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - @patch("main.extract_pull_requests") - @patch("main.transform_data") - @patch("main.load_data") - def test_processes_chunks_iteratively( - self, - mock_load, - mock_transform, - mock_extract, - mock_session_class, - mock_bq_client, - mock_setup_logging, + ), + patch("main.extract_pull_requests", return_value=iter([])) as mock_extract, ): - """Test that chunks are processed iteratively from generator.""" - # Return 3 chunks - mock_extract.return_value = iter( - [ - [{"number": 1}], - [{"number": 2}], - [{"number": 3}], - ] - ) - mock_transform.return_value = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } - - with patch.dict( + main.main() + + # Should be called twice (once per repo) + assert mock_extract.call_count == 2 + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_honors_github_api_url(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that GITHUB_API_URL is honored.""" + with ( + patch.dict( os.environ, { "GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test", "GITHUB_TOKEN": "token", + "GITHUB_API_URL": "https://custom-api.example.com", }, clear=True, - ): - result = main.main() - - assert result == 0 - # Transform and load should be called 3 times (once per chunk) - assert mock_transform.call_count == 3 - assert mock_load.call_count == 3 - - @patch("main.setup_logging") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_returns_zero_on_success( - self, mock_session_class, mock_bq_client, mock_setup_logging - ): - """Test that main returns 0 on success.""" - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])), - ): - result = main.main() - - assert result == 0 - - -@pytest.mark.integration -class TestIntegration: - """Integration tests that test multiple components together.""" - - @patch("main.setup_logging") - @patch("main.load_data") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_end_to_end_with_mocked_github( - self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging + ), + patch("main.extract_pull_requests", return_value=iter([])) as mock_extract, ): - """Test end-to-end flow with mocked GitHub responses.""" - mock_session = MagicMock() - mock_session_class.return_value = mock_session - - # Mock PR response - pr_response = Mock() - pr_response.status_code = 200 - pr_response.json.return_value = [ - {"number": 1, "title": "Bug 1234567 - Test PR", "state": "open"} - ] - pr_response.links = {} - - # Mock commits, reviewers, comments responses - empty_response = Mock() - empty_response.status_code = 200 - empty_response.json.return_value = [] - - mock_session.get.side_effect = [ - pr_response, - empty_response, - empty_response, - empty_response, - ] - - with patch.dict( + main.main() + + call_kwargs = mock_extract.call_args[1] + assert call_kwargs["github_api_url"] == "https://custom-api.example.com" + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_honors_bigquery_emulator_host(mock_session_class, mock_bq_client_class, mock_setup_logging): + """Test that BIGQUERY_EMULATOR_HOST is honored.""" + with ( + patch.dict( os.environ, { "GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test", "GITHUB_TOKEN": "token", + "BIGQUERY_EMULATOR_HOST": "http://localhost:9050", }, clear=True, - ): - result = main.main() - - assert result == 0 - mock_load.assert_called_once() - - # Verify transformed data structure - call_args = mock_load.call_args[0] - transformed_data = call_args[2] - assert "pull_requests" in transformed_data - assert len(transformed_data["pull_requests"]) == 1 - - @patch("main.setup_logging") - @patch("main.load_data") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_bug_id_extraction_through_pipeline( - self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging + ), + patch("main.extract_pull_requests", return_value=iter([])), ): - """Test bug ID extraction through full pipeline.""" - mock_session = MagicMock() - mock_session_class.return_value = mock_session + main.main() - pr_response = Mock() - pr_response.status_code = 200 - pr_response.json.return_value = [ - { - "number": 1, - "title": "Bug 9876543 - Fix critical issue", - "state": "closed", - } - ] - pr_response.links = {} + # Verify BigQuery client was created with emulator settings + mock_bq_client_class.assert_called_once() - empty_response = Mock() - empty_response.status_code = 200 - empty_response.json.return_value = [] +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_creates_session_with_headers(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that session is created with Accept and User-Agent headers.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session - mock_session.get.side_effect = [ - pr_response, - empty_response, - empty_response, - empty_response, - ] - - with patch.dict( + with ( + patch.dict( os.environ, { "GITHUB_REPOS": "mozilla/firefox", @@ -2084,59 +1795,176 @@ def test_bug_id_extraction_through_pipeline( "GITHUB_TOKEN": "token", }, clear=True, - ): - main.main() + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): + main.main() + + # Verify session headers were set + assert mock_session.headers.update.called + call_args = mock_session.headers.update.call_args[0][0] + assert "Accept" in call_args + assert "User-Agent" in call_args + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_sets_authorization_header_with_token(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that Authorization header is set when token provided.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "test-token-123", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): + main.main() + + # Verify Authorization header was set + assert mock_session.headers.__setitem__.called + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +@patch("main.extract_pull_requests") +@patch("main.transform_data") +@patch("main.load_data") +def test_single_repo_successful_etl( + mock_load, + mock_transform, + mock_extract, + mock_session_class, + mock_bq_client, + mock_setup_logging, +): + """Test successful ETL for single repository.""" + mock_extract.return_value = iter([[{"number": 1}]]) + mock_transform.return_value = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } - call_args = mock_load.call_args[0] - transformed_data = call_args[2] - pr = transformed_data["pull_requests"][0] - assert pr["bug_id"] == 9876543 - - @patch("main.setup_logging") - @patch("main.load_data") - @patch("main.bigquery.Client") - @patch("requests.Session") - def test_pagination_through_full_flow( - self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, ): - """Test pagination through full ETL flow.""" - mock_session = MagicMock() - mock_session_class.return_value = mock_session - - # First page - pr_response_1 = Mock() - pr_response_1.status_code = 200 - pr_response_1.json.return_value = [ - {"number": 1, "title": "PR 1", "state": "open"} - ] - pr_response_1.links = { - "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} - } + result = main.main() + + assert result == 0 + mock_extract.assert_called_once() + mock_transform.assert_called_once() + mock_load.assert_called_once() + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +@patch("main.extract_pull_requests") +@patch("main.transform_data") +@patch("main.load_data") +def test_multiple_repos_processing( + mock_load, + mock_transform, + mock_extract, + mock_session_class, + mock_bq_client, + mock_setup_logging, +): + """Test processing multiple repositories.""" + mock_extract.return_value = iter([[{"number": 1}]]) + mock_transform.return_value = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } - # Second page - pr_response_2 = Mock() - pr_response_2.status_code = 200 - pr_response_2.json.return_value = [ - {"number": 2, "title": "PR 2", "state": "open"} - ] - pr_response_2.links = {} - - empty_response = Mock() - empty_response.status_code = 200 - empty_response.json.return_value = [] - - mock_session.get.side_effect = [ - pr_response_1, - empty_response, - empty_response, - empty_response, - pr_response_2, - empty_response, - empty_response, - empty_response, + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev,mozilla/addons", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + # Should process 3 repositories + assert mock_extract.call_count == 3 + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +@patch("main.extract_pull_requests") +@patch("main.transform_data") +@patch("main.load_data") +def test_processes_chunks_iteratively( + mock_load, + mock_transform, + mock_extract, + mock_session_class, + mock_bq_client, + mock_setup_logging, +): + """Test that chunks are processed iteratively from generator.""" + # Return 3 chunks + mock_extract.return_value = iter( + [ + [{"number": 1}], + [{"number": 2}], + [{"number": 3}], ] + ) + mock_transform.return_value = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } - with patch.dict( + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + # Transform and load should be called 3 times (once per chunk) + assert mock_transform.call_count == 3 + assert mock_load.call_count == 3 + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_returns_zero_on_success(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that main returns 0 on success.""" + with ( + patch.dict( os.environ, { "GITHUB_REPOS": "mozilla/firefox", @@ -2145,8 +1973,166 @@ def test_pagination_through_full_flow( "GITHUB_TOKEN": "token", }, clear=True, - ): - main.main() + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): + result = main.main() + + assert result == 0 + + +@pytest.mark.integration +@patch("main.setup_logging") +@patch("main.load_data") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_full_etl_flow_transforms_data_correctly(mock_session_class, mock_bq_client, mock_load, mock_setup_logging): + """Test full ETL flow with mocked GitHub responses.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + # Mock PR response + pr_response = Mock() + pr_response.status_code = 200 + pr_response.json.return_value = [ + {"number": 1, "title": "Bug 1234567 - Test PR", "state": "open"} + ] + pr_response.links = {} + + # Mock commits, reviewers, comments responses + empty_response = Mock() + empty_response.status_code = 200 + empty_response.json.return_value = [] + + mock_session.get.side_effect = [ + pr_response, + empty_response, + empty_response, + empty_response, + ] + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + mock_load.assert_called_once() + + # Verify transformed data structure + call_args = mock_load.call_args[0] + transformed_data = call_args[2] + assert "pull_requests" in transformed_data + assert len(transformed_data["pull_requests"]) == 1 + +@patch("main.setup_logging") +@patch("main.load_data") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_bug_id_extraction_through_pipeline(mock_session_class, mock_bq_client, mock_load, mock_setup_logging): + """Test bug ID extraction through full pipeline.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + pr_response = Mock() + pr_response.status_code = 200 + pr_response.json.return_value = [ + { + "number": 1, + "title": "Bug 9876543 - Fix critical issue", + "state": "closed", + } + ] + pr_response.links = {} + + empty_response = Mock() + empty_response.status_code = 200 + empty_response.json.return_value = [] + + mock_session.get.side_effect = [ + pr_response, + empty_response, + empty_response, + empty_response, + ] + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + main.main() + + call_args = mock_load.call_args[0] + transformed_data = call_args[2] + pr = transformed_data["pull_requests"][0] + assert pr["bug_id"] == 9876543 + +@patch("main.setup_logging") +@patch("main.load_data") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_pagination_through_full_flow(mock_session_class, mock_bq_client, mock_load, mock_setup_logging): + """Test pagination through full ETL flow.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + # First page + pr_response_1 = Mock() + pr_response_1.status_code = 200 + pr_response_1.json.return_value = [ + {"number": 1, "title": "PR 1", "state": "open"} + ] + pr_response_1.links = { + "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} + } + + # Second page + pr_response_2 = Mock() + pr_response_2.status_code = 200 + pr_response_2.json.return_value = [ + {"number": 2, "title": "PR 2", "state": "open"} + ] + pr_response_2.links = {} + + empty_response = Mock() + empty_response.status_code = 200 + empty_response.json.return_value = [] + + mock_session.get.side_effect = [ + pr_response_1, + empty_response, + empty_response, + empty_response, + pr_response_2, + empty_response, + empty_response, + empty_response, + ] + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + main.main() - # Should be called twice (once per chunk/page) - assert mock_load.call_count == 2 + # Should be called twice (once per chunk/page) + assert mock_load.call_count == 2 From 4bb878e5ac26c9583ffc98b3990fe938eb19494f Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Wed, 4 Feb 2026 18:11:38 -0500 Subject: [PATCH 08/11] Added conftest.py and moved tests to test/ directory --- tests/conftest.py | 284 +++++++++++++++++++++++++++++ test_main.py => tests/test_main.py | 264 +++++++++++++-------------- 2 files changed, 408 insertions(+), 140 deletions(-) create mode 100644 tests/conftest.py rename test_main.py => tests/test_main.py (93%) diff --git a/tests/conftest.py b/tests/conftest.py new file mode 100644 index 0000000..0656e29 --- /dev/null +++ b/tests/conftest.py @@ -0,0 +1,284 @@ +""" +Pytest fixtures for GitHub ETL tests. + +This module provides reusable test fixtures for mocking external dependencies +and providing sample data for unit and integration tests. +""" + +from datetime import datetime, timezone +from typing import Any +from unittest.mock import MagicMock, Mock + +import pytest +import requests +from google.cloud import bigquery + + +@pytest.fixture +def mock_env_vars(monkeypatch) -> dict[str, str]: + """ + Set up common environment variables for tests. + + Returns: + Dictionary of environment variables that were set + """ + env_vars = { + "GITHUB_TOKEN": "test_token_123", + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test-project", + "BIGQUERY_DATASET": "test_dataset", + } + for key, value in env_vars.items(): + monkeypatch.setenv(key, value) + return env_vars + + +@pytest.fixture +def sample_github_pr() -> dict[str, Any]: + """ + Sample GitHub pull request data from API response. + + Returns: + Dictionary representing a single PR from GitHub API + """ + return { + "number": 12345, + "state": "closed", + "title": "Bug 1234567 - Fix memory leak in parser", + "created_at": "2025-01-01T10:00:00Z", + "updated_at": "2025-01-02T15:30:00Z", + "merged_at": "2025-01-02T15:30:00Z", + "labels": [ + {"name": "bug"}, + {"name": "priority-high"}, + ], + "user": { + "login": "test_user", + "id": 123, + }, + "head": { + "ref": "feature-branch", + "sha": "abc123", + }, + "base": { + "ref": "main", + "sha": "def456", + }, + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + + +@pytest.fixture +def sample_github_commit() -> dict[str, Any]: + """ + Sample GitHub commit data from API response. + + Returns: + Dictionary representing a single commit from GitHub API + """ + return { + "sha": "abc123def456", + "commit": { + "author": { + "name": "Test Author", + "email": "author@example.com", + "date": "2025-01-01T10:00:00Z", + }, + "message": "Fix bug in parser", + }, + "files": [ + { + "filename": "src/parser.py", + "additions": 10, + "deletions": 5, + "changes": 15, + } + ], + } + + +@pytest.fixture +def sample_github_reviewer() -> dict[str, Any]: + """ + Sample GitHub review data from API response. + + Returns: + Dictionary representing a single review from GitHub API + """ + return { + "id": 98765, + "user": { + "login": "reviewer_user", + "id": 456, + }, + "state": "APPROVED", + "submitted_at": "2025-01-02T12:00:00Z", + "body": "LGTM", + } + + +@pytest.fixture +def sample_github_comment() -> dict[str, Any]: + """ + Sample GitHub comment data from API response. + + Returns: + Dictionary representing a single comment from GitHub API + """ + return { + "id": 111222, + "user": { + "login": "commenter_user", + "id": 789, + }, + "created_at": "2025-01-01T14:00:00Z", + "body": "Please check the edge case for null values", + "pull_request_review_id": None, + } + + +@pytest.fixture +def sample_transformed_data() -> dict[str, list[dict]]: + """ + Sample transformed data ready for BigQuery insertion. + + Returns: + Dictionary with keys for each table and transformed row data + """ + return { + "pull_requests": [ + { + "pull_request_id": 12345, + "current_status": "closed", + "date_created": "2025-01-01T10:00:00Z", + "date_modified": "2025-01-02T15:30:00Z", + "target_repository": "mozilla/firefox", + "bug_id": 1234567, + "date_landed": "2025-01-02T15:30:00Z", + "date_approved": "2025-01-02T12:00:00Z", + "labels": ["bug", "priority-high"], + } + ], + "commits": [ + { + "pull_request_id": 12345, + "target_repository": "mozilla/firefox", + "commit_sha": "abc123def456", + "date_created": "2025-01-01T10:00:00Z", + "author_username": "Test Author", + "author_email": None, + "filename": "src/parser.py", + "lines_removed": 5, + "lines_added": 10, + } + ], + "reviewers": [ + { + "pull_request_id": 12345, + "target_repository": "mozilla/firefox", + "date_reviewed": "2025-01-02T12:00:00Z", + "reviewer_email": None, + "reviewer_username": "reviewer_user", + "status": "APPROVED", + } + ], + "comments": [ + { + "pull_request_id": 12345, + "target_repository": "mozilla/firefox", + "comment_id": 111222, + "date_created": "2025-01-01T14:00:00Z", + "author_email": None, + "author_username": "commenter_user", + "character_count": 43, + "status": None, + } + ], + } + + +@pytest.fixture +def mock_session() -> Mock: + """ + Mock requests.Session with configurable responses. + + Returns: + Mock session object with get() method + """ + session = Mock(spec=requests.Session) + session.headers = {} + return session + + +@pytest.fixture +def mock_github_response() -> Mock: + """ + Mock requests.Response for GitHub API calls. + + Returns: + Mock response with status_code, json(), headers, and links + """ + response = Mock(spec=requests.Response) + response.status_code = 200 + response.headers = { + "X-RateLimit-Remaining": "5000", + "X-RateLimit-Reset": "1609459200", + } + response.links = {} + response.text = "" + return response + + +@pytest.fixture +def mock_rate_limited_response() -> Mock: + """ + Mock requests.Response simulating rate limit exceeded. + + Returns: + Mock response with 403 status and rate limit headers + """ + response = Mock(spec=requests.Response) + response.status_code = 403 + response.headers = { + "X-RateLimit-Remaining": "0", + "X-RateLimit-Reset": str(int(datetime.now(timezone.utc).timestamp()) + 3600), + } + response.text = "API rate limit exceeded" + return response + + +@pytest.fixture +def mock_bigquery_client() -> Mock: + """ + Mock BigQuery client for testing load operations. + + Returns: + Mock BigQuery client with insert_rows_json() method + """ + client = Mock(spec=bigquery.Client) + client.project = "test-project" + client.insert_rows_json = MagicMock(return_value=[]) # Empty list = no errors + return client + + +@pytest.fixture +def mock_bigquery_client_with_errors() -> Mock: + """ + Mock BigQuery client that returns insertion errors. + + Returns: + Mock BigQuery client that simulates insert failures + """ + client = Mock(spec=bigquery.Client) + client.project = "test-project" + client.insert_rows_json = MagicMock( + return_value=[ + { + "index": 0, + "errors": [{"reason": "invalid", "message": "Invalid schema"}], + } + ] + ) + return client diff --git a/test_main.py b/tests/test_main.py similarity index 93% rename from test_main.py rename to tests/test_main.py index 0d38ac3..19ba7a4 100644 --- a/test_main.py +++ b/tests/test_main.py @@ -11,112 +11,9 @@ from unittest.mock import MagicMock, Mock, patch import pytest -import requests -from google.cloud import bigquery import main -# ============================================================================= -# FIXTURES -# ============================================================================= - - -@pytest.fixture -def mock_session(): - """Provide a mocked requests.Session for testing.""" - session = Mock(spec=requests.Session) - session.headers = {} - return session - - -@pytest.fixture -def mock_bigquery_client(): - """Provide a mocked BigQuery client for testing.""" - client = Mock(spec=bigquery.Client) - client.project = "test-project" - client.insert_rows_json = Mock(return_value=[]) - return client - - -@pytest.fixture -def mock_pr_response(): - """Provide a realistic pull request response for testing.""" - return { - "number": 123, - "title": "Bug 1234567 - Fix login issue", - "state": "closed", - "created_at": "2024-01-01T10:00:00Z", - "updated_at": "2024-01-02T10:00:00Z", - "merged_at": "2024-01-02T10:00:00Z", - "user": {"login": "testuser"}, - "head": {"ref": "fix-branch"}, - "base": {"ref": "main"}, - "labels": [{"name": "bug"}, {"name": "priority-high"}], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - - -@pytest.fixture -def mock_commit_response(): - """Provide a realistic commit response with files.""" - return { - "sha": "abc123def456", - "commit": { - "author": { - "name": "Test Author", - "email": "test@example.com", - "date": "2024-01-01T12:00:00Z", - } - }, - "files": [ - { - "filename": "src/login.py", - "additions": 10, - "deletions": 5, - "changes": 15, - }, - { - "filename": "tests/test_login.py", - "additions": 20, - "deletions": 2, - "changes": 22, - }, - ], - } - - -@pytest.fixture -def mock_reviewer_response(): - """Provide a realistic reviewer response.""" - return { - "id": 789, - "user": {"login": "reviewer1"}, - "state": "APPROVED", - "submitted_at": "2024-01-01T15:00:00Z", - "body": "LGTM", - } - - -@pytest.fixture -def mock_comment_response(): - """Provide a realistic comment response.""" - return { - "id": 456, - "user": {"login": "commenter1"}, - "created_at": "2024-01-01T14:00:00Z", - "body": "This looks good to me", - "pull_request_review_id": None, - } - - -# ============================================================================= -# TEST CLASSES -# ============================================================================= - - - # ============================================================================= # TESTS FOR SETUP_LOGGING # ============================================================================= @@ -132,13 +29,11 @@ def test_setup_logging(): # Check that at least one handler is a StreamHandler has_stream_handler = any( - isinstance(handler, logging.StreamHandler) - for handler in root_logger.handlers + isinstance(handler, logging.StreamHandler) for handler in root_logger.handlers ) assert has_stream_handler - # ============================================================================= # TESTS FOR SLEEP_FOR_RATE_LIMIT # ============================================================================= @@ -206,7 +101,6 @@ def test_sleep_for_rate_limit_with_missing_headers(mock_sleep): mock_sleep.assert_not_called() - # ============================================================================= # TESTS FOR EXTRACT_PULL_REQUESTS # ============================================================================= @@ -237,6 +131,7 @@ def test_extract_pull_requests_basic(mock_session): assert result[0][0]["number"] == 1 assert result[0][1]["number"] == 2 + def test_extract_multiple_pages(mock_session): """Test extracting data across multiple pages with pagination.""" # First page response @@ -271,6 +166,7 @@ def test_extract_multiple_pages(mock_session): assert result[0][0]["number"] == 1 assert result[1][0]["number"] == 3 + def test_enriches_prs_with_commit_data(mock_session): """Test that PRs are enriched with commit data.""" mock_response = Mock() @@ -294,6 +190,7 @@ def test_enriches_prs_with_commit_data(mock_session): assert result[0][0]["commit_data"] == mock_commits mock_extract_commits.assert_called_once() + def test_enriches_prs_with_reviewer_data(mock_session): """Test that PRs are enriched with reviewer data.""" mock_response = Mock() @@ -317,6 +214,7 @@ def test_enriches_prs_with_reviewer_data(mock_session): assert result[0][0]["reviewer_data"] == mock_reviewers mock_extract_reviewers.assert_called_once() + def test_enriches_prs_with_comment_data(mock_session): """Test that PRs are enriched with comment data.""" mock_response = Mock() @@ -340,6 +238,7 @@ def test_enriches_prs_with_comment_data(mock_session): assert result[0][0]["comment_data"] == mock_comments mock_extract_comments.assert_called_once() + @patch("main.sleep_for_rate_limit") def test_handles_rate_limit(mock_sleep, mock_session): """Test that extract_pull_requests handles rate limiting correctly.""" @@ -369,6 +268,7 @@ def test_handles_rate_limit(mock_sleep, mock_session): mock_sleep.assert_called_once_with(mock_response_rate_limit) assert len(result) == 1 + def test_handles_api_error_404(mock_session): """Test that extract_pull_requests raises SystemExit on 404.""" mock_response = Mock() @@ -382,6 +282,7 @@ def test_handles_api_error_404(mock_session): assert "GitHub API error 404" in str(exc_info.value) + def test_handles_api_error_500(mock_session): """Test that extract_pull_requests raises SystemExit on 500.""" mock_response = Mock() @@ -395,6 +296,7 @@ def test_handles_api_error_500(mock_session): assert "GitHub API error 500" in str(exc_info.value) + def test_stops_on_empty_batch(mock_session): """Test that extraction stops when an empty batch is returned.""" # First page with data @@ -424,6 +326,7 @@ def test_stops_on_empty_batch(mock_session): assert len(result) == 1 assert len(result[0]) == 1 + def test_invalid_page_number_handling(mock_session): """Test handling of invalid page number in pagination.""" mock_response_1 = Mock() @@ -447,6 +350,7 @@ def test_invalid_page_number_handling(mock_session): # Should stop pagination on invalid page number assert len(result) == 1 + def test_custom_github_api_url(mock_session): """Test using custom GitHub API URL.""" custom_url = "https://mock-github.example.com" @@ -473,6 +377,7 @@ def test_custom_github_api_url(mock_session): call_args = mock_session.get.call_args assert custom_url in call_args[0][0] + def test_skips_prs_without_number_field(mock_session): """Test that PRs without 'number' field are skipped.""" mock_response = Mock() @@ -497,11 +402,13 @@ def test_skips_prs_without_number_field(mock_session): assert mock_commits.call_count == 2 - # ============================================================================= # TESTS FOR EXTRACT_COMMITS # ============================================================================= + +def test_extract_commits_with_files(mock_session): + """Test extracting commits with file details.""" # Mock commits list response commits_response = Mock() commits_response.status_code = 200 @@ -539,6 +446,7 @@ def test_skips_prs_without_number_field(mock_session): assert result[1]["sha"] == "def456" assert result[1]["files"][0]["filename"] == "file2.py" + def test_multiple_files_per_commit(mock_session): """Test handling multiple files in a single commit.""" commits_response = Mock() @@ -563,6 +471,7 @@ def test_multiple_files_per_commit(mock_session): assert len(result) == 1 assert len(result[0]["files"]) == 3 + @patch("main.sleep_for_rate_limit") def test_rate_limit_on_commits_list(mock_sleep, mock_session): """Test rate limit handling when fetching commits list.""" @@ -583,6 +492,7 @@ def test_rate_limit_on_commits_list(mock_sleep, mock_session): mock_sleep.assert_called_once() assert result == [] + def test_api_error_on_commits_list(mock_session): """Test API error handling when fetching commits list.""" error_response = Mock() @@ -596,6 +506,7 @@ def test_api_error_on_commits_list(mock_session): assert "GitHub API error 500" in str(exc_info.value) + def test_api_error_on_individual_commit(mock_session): """Test API error when fetching individual commit details.""" commits_response = Mock() @@ -613,6 +524,7 @@ def test_api_error_on_individual_commit(mock_session): assert "GitHub API error 404" in str(exc_info.value) + def test_commit_without_sha_field(mock_session): """Test handling commits without sha field.""" commits_response = Mock() @@ -641,7 +553,8 @@ def test_commit_without_sha_field(mock_session): # Should handle the commit without sha gracefully assert len(result) == 2 -def test_custom_github_api_url(mock_session): + +def test_custom_github_api_url_commits(mock_session): """Test using custom GitHub API URL for commits.""" custom_url = "https://mock-github.example.com" @@ -658,6 +571,7 @@ def test_custom_github_api_url(mock_session): call_args = mock_session.get.call_args assert custom_url in call_args[0][0] + def test_empty_commits_list(mock_session): """Test handling PR with no commits.""" commits_response = Mock() @@ -671,11 +585,13 @@ def test_empty_commits_list(mock_session): assert result == [] - # ============================================================================= # TESTS FOR EXTRACT_REVIEWERS # ============================================================================= + +def test_extract_reviewers_basic(mock_session): + """Test basic extraction of reviewers.""" reviewers_response = Mock() reviewers_response.status_code = 200 reviewers_response.json.return_value = [ @@ -701,6 +617,7 @@ def test_empty_commits_list(mock_session): assert result[0]["state"] == "APPROVED" assert result[1]["state"] == "CHANGES_REQUESTED" + def test_multiple_review_states(mock_session): """Test handling multiple different review states.""" reviewers_response = Mock() @@ -722,6 +639,7 @@ def test_multiple_review_states(mock_session): assert "CHANGES_REQUESTED" in states assert "COMMENTED" in states + def test_empty_reviewers_list(mock_session): """Test handling PR with no reviewers.""" reviewers_response = Mock() @@ -734,6 +652,7 @@ def test_empty_reviewers_list(mock_session): assert result == [] + @patch("main.sleep_for_rate_limit") def test_rate_limit_handling(mock_sleep, mock_session): """Test rate limit handling when fetching reviewers.""" @@ -752,6 +671,7 @@ def test_rate_limit_handling(mock_sleep, mock_session): mock_sleep.assert_called_once() assert result == [] + def test_api_error(mock_session): """Test API error handling when fetching reviewers.""" error_response = Mock() @@ -765,7 +685,8 @@ def test_api_error(mock_session): assert "GitHub API error 500" in str(exc_info.value) -def test_custom_github_api_url(mock_session): + +def test_custom_github_api_url_reviewers(mock_session): """Test using custom GitHub API URL for reviewers.""" custom_url = "https://mock-github.example.com" @@ -783,11 +704,13 @@ def test_custom_github_api_url(mock_session): assert custom_url in call_args[0][0] - # ============================================================================= # TESTS FOR EXTRACT_COMMENTS # ============================================================================= + +def test_extract_comments_basic(mock_session): + """Test basic extraction of comments.""" comments_response = Mock() comments_response.status_code = 200 comments_response.json.return_value = [ @@ -813,6 +736,7 @@ def test_custom_github_api_url(mock_session): assert result[0]["id"] == 456 assert result[1]["id"] == 457 + def test_uses_issues_endpoint(mock_session): """Test that comments use /issues endpoint not /pulls.""" comments_response = Mock() @@ -828,6 +752,7 @@ def test_uses_issues_endpoint(mock_session): assert "/issues/123/comments" in url assert "/pulls/123/comments" not in url + def test_multiple_comments(mock_session): """Test handling multiple comments.""" comments_response = Mock() @@ -843,6 +768,7 @@ def test_multiple_comments(mock_session): assert len(result) == 10 + def test_empty_comments_list(mock_session): """Test handling PR with no comments.""" comments_response = Mock() @@ -855,8 +781,9 @@ def test_empty_comments_list(mock_session): assert result == [] + @patch("main.sleep_for_rate_limit") -def test_rate_limit_handling(mock_sleep, mock_session): +def test_rate_limit_handling_comments(mock_sleep, mock_session): """Test rate limit handling when fetching comments.""" rate_limit_response = Mock() rate_limit_response.status_code = 403 @@ -873,7 +800,8 @@ def test_rate_limit_handling(mock_sleep, mock_session): mock_sleep.assert_called_once() assert result == [] -def test_api_error(mock_session): + +def test_api_error_comments(mock_session): """Test API error handling when fetching comments.""" error_response = Mock() error_response.status_code = 404 @@ -886,7 +814,8 @@ def test_api_error(mock_session): assert "GitHub API error 404" in str(exc_info.value) -def test_custom_github_api_url(mock_session): + +def test_custom_github_api_url_comments(mock_session): """Test using custom GitHub API URL for comments.""" custom_url = "https://mock-github.example.com" @@ -904,11 +833,13 @@ def test_custom_github_api_url(mock_session): assert custom_url in call_args[0][0] - # ============================================================================= # TESTS FOR TRANSFORM_DATA # ============================================================================= + +def test_transform_data_basic(): + """Test basic transformation of pull request data.""" raw_data = [ { "number": 123, @@ -935,6 +866,7 @@ def test_custom_github_api_url(mock_session): assert pr["date_landed"] == "2024-01-02T12:00:00Z" assert pr["target_repository"] == "mozilla/firefox" + def test_bug_id_extraction_basic(): """Test bug ID extraction from PR title.""" test_cases = [ @@ -960,6 +892,7 @@ def test_bug_id_extraction_basic(): result = main.transform_data(raw_data, "mozilla/firefox") assert result["pull_requests"][0]["bug_id"] == expected_bug_id + def test_bug_id_extraction_with_hash(): """Test bug ID extraction with # symbol.""" raw_data = [ @@ -977,6 +910,7 @@ def test_bug_id_extraction_with_hash(): result = main.transform_data(raw_data, "mozilla/firefox") assert result["pull_requests"][0]["bug_id"] == 1234567 + def test_bug_id_filter_large_numbers(): """Test that bug IDs >= 100000000 are filtered out.""" raw_data = [ @@ -994,6 +928,7 @@ def test_bug_id_filter_large_numbers(): result = main.transform_data(raw_data, "mozilla/firefox") assert result["pull_requests"][0]["bug_id"] is None + def test_bug_id_no_match(): """Test PR title with no bug ID.""" raw_data = [ @@ -1011,6 +946,7 @@ def test_bug_id_no_match(): result = main.transform_data(raw_data, "mozilla/firefox") assert result["pull_requests"][0]["bug_id"] is None + def test_labels_extraction(): """Test labels array extraction.""" raw_data = [ @@ -1036,6 +972,7 @@ def test_labels_extraction(): assert "priority-high" in labels assert "needs-review" in labels + def test_labels_empty_list(): """Test handling empty labels list.""" raw_data = [ @@ -1053,6 +990,7 @@ def test_labels_empty_list(): result = main.transform_data(raw_data, "mozilla/firefox") assert result["pull_requests"][0]["labels"] == [] + def test_commit_transformation(): """Test commit fields mapping.""" raw_data = [ @@ -1097,6 +1035,7 @@ def test_commit_transformation(): assert commit["lines_added"] == 10 assert commit["lines_removed"] == 5 + def test_commit_file_flattening(): """Test that each file becomes a separate row.""" raw_data = [ @@ -1130,6 +1069,7 @@ def test_commit_file_flattening(): assert "file2.py" in filenames assert "file3.py" in filenames + def test_multiple_commits_with_files(): """Test multiple commits with multiple files per PR.""" raw_data = [ @@ -1168,6 +1108,7 @@ def test_multiple_commits_with_files(): assert result["commits"][1]["commit_sha"] == "commit2" assert result["commits"][2]["commit_sha"] == "commit2" + def test_reviewer_transformation(): """Test reviewer fields mapping.""" raw_data = [ @@ -1199,8 +1140,9 @@ def test_reviewer_transformation(): assert reviewer["status"] == "APPROVED" assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z" -def test_multiple_review_states(): - """Test handling multiple review states.""" + +def test_transform_multiple_review_states(): + """Test transforming data with multiple review states.""" raw_data = [ { "number": 123, @@ -1240,6 +1182,7 @@ def test_multiple_review_states(): assert "CHANGES_REQUESTED" in states assert "COMMENTED" in states + def test_date_approved_from_earliest_approval(): """Test that date_approved is set to earliest APPROVED review.""" raw_data = [ @@ -1278,6 +1221,7 @@ def test_date_approved_from_earliest_approval(): pr = result["pull_requests"][0] assert pr["date_approved"] == "2024-01-01T14:00:00Z" + def test_comment_transformation(): """Test comment fields mapping.""" raw_data = [ @@ -1311,6 +1255,7 @@ def test_comment_transformation(): assert comment["date_created"] == "2024-01-01T14:00:00Z" assert comment["character_count"] == 17 + def test_comment_character_count(): """Test character count calculation for comments.""" raw_data = [ @@ -1343,6 +1288,7 @@ def test_comment_character_count(): assert result["comments"][0]["character_count"] == 5 assert result["comments"][1]["character_count"] == 44 + def test_comment_status_from_review(): """Test that comment status is mapped from review_id_statuses.""" raw_data = [ @@ -1377,6 +1323,7 @@ def test_comment_status_from_review(): # Comment should have status from the review assert result["comments"][0]["status"] == "APPROVED" + def test_comment_empty_body(): """Test handling comments with empty or None body.""" raw_data = [ @@ -1409,6 +1356,7 @@ def test_comment_empty_body(): assert result["comments"][0]["character_count"] == 0 assert result["comments"][1]["character_count"] == 0 + def test_empty_raw_data(): """Test handling empty input list.""" result = main.transform_data([], "mozilla/firefox") @@ -1418,6 +1366,7 @@ def test_empty_raw_data(): assert result["reviewers"] == [] assert result["comments"] == [] + def test_pr_without_commits_reviewers_comments(): """Test PR with no commits, reviewers, or comments.""" raw_data = [ @@ -1439,6 +1388,7 @@ def test_pr_without_commits_reviewers_comments(): assert len(result["reviewers"]) == 0 assert len(result["comments"]) == 0 + def test_return_structure(): """Test that transform_data returns dict with 4 keys.""" raw_data = [ @@ -1461,6 +1411,7 @@ def test_return_structure(): assert "reviewers" in result assert "comments" in result + def test_all_tables_have_target_repository(): """Test that all tables include target_repository field.""" raw_data = [ @@ -1473,9 +1424,7 @@ def test_all_tables_have_target_repository(): { "sha": "abc", "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, - "files": [ - {"filename": "test.py", "additions": 1, "deletions": 0} - ], + "files": [{"filename": "test.py", "additions": 1, "deletions": 0}], } ], "reviewer_data": [ @@ -1505,7 +1454,6 @@ def test_all_tables_have_target_repository(): assert result["comments"][0]["target_repository"] == "mozilla/firefox" - # ============================================================================= # TESTS FOR LOAD_DATA # ============================================================================= @@ -1528,6 +1476,7 @@ def test_load_data_inserts_all_tables(mock_datetime, mock_bigquery_client): # Should call insert_rows_json 4 times (once per table) assert mock_bigquery_client.insert_rows_json.call_count == 4 + @patch("main.datetime") def test_adds_snapshot_date(mock_datetime, mock_bigquery_client): """Test that snapshot_date is added to all rows.""" @@ -1546,6 +1495,7 @@ def test_adds_snapshot_date(mock_datetime, mock_bigquery_client): rows = call_args[0][1] assert all(row["snapshot_date"] == "2024-01-15" for row in rows) + def test_constructs_correct_table_ref(mock_bigquery_client): """Test that table_ref is constructed correctly.""" transformed_data = { @@ -1561,6 +1511,7 @@ def test_constructs_correct_table_ref(mock_bigquery_client): table_ref = call_args[0][0] assert table_ref == "test-project.my_dataset.pull_requests" + def test_empty_transformed_data_skipped(mock_bigquery_client): """Test that empty transformed_data dict is skipped.""" transformed_data = {} @@ -1569,6 +1520,7 @@ def test_empty_transformed_data_skipped(mock_bigquery_client): mock_bigquery_client.insert_rows_json.assert_not_called() + def test_skips_empty_tables_individually(mock_bigquery_client): """Test that empty tables are skipped individually.""" transformed_data = { @@ -1583,6 +1535,7 @@ def test_skips_empty_tables_individually(mock_bigquery_client): # Should only call insert_rows_json twice (for PRs and comments) assert mock_bigquery_client.insert_rows_json.call_count == 2 + def test_only_pull_requests_table(mock_bigquery_client): """Test loading only pull_requests table.""" transformed_data = { @@ -1596,6 +1549,7 @@ def test_only_pull_requests_table(mock_bigquery_client): assert mock_bigquery_client.insert_rows_json.call_count == 1 + def test_raises_exception_on_insert_errors(mock_bigquery_client): """Test that Exception is raised on BigQuery insert errors.""" mock_bigquery_client.insert_rows_json.return_value = [ @@ -1614,6 +1568,7 @@ def test_raises_exception_on_insert_errors(mock_bigquery_client): assert "BigQuery insert errors" in str(exc_info.value) + def test_verifies_client_insert_called_correctly(mock_bigquery_client): """Test that client.insert_rows_json is called with correct arguments.""" transformed_data = { @@ -1632,7 +1587,6 @@ def test_verifies_client_insert_called_correctly(mock_bigquery_client): assert len(rows) == 2 - # ============================================================================= # TESTS FOR MAIN # ============================================================================= @@ -1657,7 +1611,9 @@ def test_requires_github_repos(mock_session_class, mock_bq_client, mock_setup_lo @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") -def test_requires_bigquery_project(mock_session_class, mock_bq_client, mock_setup_logging): +def test_requires_bigquery_project( + mock_session_class, mock_bq_client, mock_setup_logging +): """Test that BIGQUERY_PROJECT is required.""" with patch.dict( os.environ, @@ -1673,7 +1629,9 @@ def test_requires_bigquery_project(mock_session_class, mock_bq_client, mock_setu @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") -def test_requires_bigquery_dataset(mock_session_class, mock_bq_client, mock_setup_logging): +def test_requires_bigquery_dataset( + mock_session_class, mock_bq_client, mock_setup_logging +): """Test that BIGQUERY_DATASET is required.""" with patch.dict( os.environ, @@ -1685,10 +1643,13 @@ def test_requires_bigquery_dataset(mock_session_class, mock_bq_client, mock_setu assert "BIGQUERY_DATASET" in str(exc_info.value) + @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") -def test_github_token_optional_with_warning(mock_session_class, mock_bq_client, mock_setup_logging): +def test_github_token_optional_with_warning( + mock_session_class, mock_bq_client, mock_setup_logging +): """Test that GITHUB_TOKEN is optional but warns if missing.""" with ( patch.dict( @@ -1706,10 +1667,13 @@ def test_github_token_optional_with_warning(mock_session_class, mock_bq_client, result = main.main() assert result == 0 + @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") -def test_splits_github_repos_by_comma(mock_session_class, mock_bq_client, mock_setup_logging): +def test_splits_github_repos_by_comma( + mock_session_class, mock_bq_client, mock_setup_logging +): """Test that GITHUB_REPOS is split by comma.""" with ( patch.dict( @@ -1729,6 +1693,7 @@ def test_splits_github_repos_by_comma(mock_session_class, mock_bq_client, mock_s # Should be called twice (once per repo) assert mock_extract.call_count == 2 + @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") @@ -1753,10 +1718,13 @@ def test_honors_github_api_url(mock_session_class, mock_bq_client, mock_setup_lo call_kwargs = mock_extract.call_args[1] assert call_kwargs["github_api_url"] == "https://custom-api.example.com" + @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") -def test_honors_bigquery_emulator_host(mock_session_class, mock_bq_client_class, mock_setup_logging): +def test_honors_bigquery_emulator_host( + mock_session_class, mock_bq_client_class, mock_setup_logging +): """Test that BIGQUERY_EMULATOR_HOST is honored.""" with ( patch.dict( @@ -1777,10 +1745,13 @@ def test_honors_bigquery_emulator_host(mock_session_class, mock_bq_client_class, # Verify BigQuery client was created with emulator settings mock_bq_client_class.assert_called_once() + @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") -def test_creates_session_with_headers(mock_session_class, mock_bq_client, mock_setup_logging): +def test_creates_session_with_headers( + mock_session_class, mock_bq_client, mock_setup_logging +): """Test that session is created with Accept and User-Agent headers.""" mock_session = MagicMock() mock_session_class.return_value = mock_session @@ -1806,10 +1777,13 @@ def test_creates_session_with_headers(mock_session_class, mock_bq_client, mock_s assert "Accept" in call_args assert "User-Agent" in call_args + @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") -def test_sets_authorization_header_with_token(mock_session_class, mock_bq_client, mock_setup_logging): +def test_sets_authorization_header_with_token( + mock_session_class, mock_bq_client, mock_setup_logging +): """Test that Authorization header is set when token provided.""" mock_session = MagicMock() mock_session_class.return_value = mock_session @@ -1832,6 +1806,7 @@ def test_sets_authorization_header_with_token(mock_session_class, mock_bq_client # Verify Authorization header was set assert mock_session.headers.__setitem__.called + @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") @@ -1872,6 +1847,7 @@ def test_single_repo_successful_etl( mock_transform.assert_called_once() mock_load.assert_called_once() + @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") @@ -1911,6 +1887,7 @@ def test_multiple_repos_processing( # Should process 3 repositories assert mock_extract.call_count == 3 + @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") @@ -1958,10 +1935,13 @@ def test_processes_chunks_iteratively( assert mock_transform.call_count == 3 assert mock_load.call_count == 3 + @patch("main.setup_logging") @patch("main.bigquery.Client") @patch("requests.Session") -def test_returns_zero_on_success(mock_session_class, mock_bq_client, mock_setup_logging): +def test_returns_zero_on_success( + mock_session_class, mock_bq_client, mock_setup_logging +): """Test that main returns 0 on success.""" with ( patch.dict( @@ -1986,7 +1966,9 @@ def test_returns_zero_on_success(mock_session_class, mock_bq_client, mock_setup_ @patch("main.load_data") @patch("main.bigquery.Client") @patch("requests.Session") -def test_full_etl_flow_transforms_data_correctly(mock_session_class, mock_bq_client, mock_load, mock_setup_logging): +def test_full_etl_flow_transforms_data_correctly( + mock_session_class, mock_bq_client, mock_load, mock_setup_logging +): """Test full ETL flow with mocked GitHub responses.""" mock_session = MagicMock() mock_session_class.return_value = mock_session @@ -2032,11 +2014,14 @@ def test_full_etl_flow_transforms_data_correctly(mock_session_class, mock_bq_cli assert "pull_requests" in transformed_data assert len(transformed_data["pull_requests"]) == 1 + @patch("main.setup_logging") @patch("main.load_data") @patch("main.bigquery.Client") @patch("requests.Session") -def test_bug_id_extraction_through_pipeline(mock_session_class, mock_bq_client, mock_load, mock_setup_logging): +def test_bug_id_extraction_through_pipeline( + mock_session_class, mock_bq_client, mock_load, mock_setup_logging +): """Test bug ID extraction through full pipeline.""" mock_session = MagicMock() mock_session_class.return_value = mock_session @@ -2080,11 +2065,14 @@ def test_bug_id_extraction_through_pipeline(mock_session_class, mock_bq_client, pr = transformed_data["pull_requests"][0] assert pr["bug_id"] == 9876543 + @patch("main.setup_logging") @patch("main.load_data") @patch("main.bigquery.Client") @patch("requests.Session") -def test_pagination_through_full_flow(mock_session_class, mock_bq_client, mock_load, mock_setup_logging): +def test_pagination_through_full_flow( + mock_session_class, mock_bq_client, mock_load, mock_setup_logging +): """Test pagination through full ETL flow.""" mock_session = MagicMock() mock_session_class.return_value = mock_session @@ -2092,9 +2080,7 @@ def test_pagination_through_full_flow(mock_session_class, mock_bq_client, mock_l # First page pr_response_1 = Mock() pr_response_1.status_code = 200 - pr_response_1.json.return_value = [ - {"number": 1, "title": "PR 1", "state": "open"} - ] + pr_response_1.json.return_value = [{"number": 1, "title": "PR 1", "state": "open"}] pr_response_1.links = { "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} } @@ -2102,9 +2088,7 @@ def test_pagination_through_full_flow(mock_session_class, mock_bq_client, mock_l # Second page pr_response_2 = Mock() pr_response_2.status_code = 200 - pr_response_2.json.return_value = [ - {"number": 2, "title": "PR 2", "state": "open"} - ] + pr_response_2.json.return_value = [{"number": 2, "title": "PR 2", "state": "open"}] pr_response_2.links = {} empty_response = Mock() From e3647c4ae2a9f7e28fd5050f9d02df2d26b79c90 Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Fri, 6 Feb 2026 16:33:25 -0500 Subject: [PATCH 09/11] - Fixed integration test gitub action to use docker compose properly. - Broke up all of the tests into individual files based on function to make for easier review. --- .github/workflows/tests.yml | 6 +- tests/test_extract_comments.py | 137 ++ tests/test_extract_commits.py | 190 +++ tests/test_extract_pull_requests.py | 309 ++++ tests/test_extract_reviewers.py | 127 ++ tests/test_load_data.py | 141 ++ tests/test_logging.py | 25 + tests/test_main.py | 2122 --------------------------- tests/test_main_integration.py | 544 +++++++ tests/test_rate_limit.py | 72 + tests/test_transform_data.py | 625 ++++++++ 11 files changed, 2173 insertions(+), 2125 deletions(-) create mode 100644 tests/test_extract_comments.py create mode 100644 tests/test_extract_commits.py create mode 100644 tests/test_extract_pull_requests.py create mode 100644 tests/test_extract_reviewers.py create mode 100644 tests/test_load_data.py create mode 100644 tests/test_logging.py delete mode 100644 tests/test_main.py create mode 100644 tests/test_main_integration.py create mode 100644 tests/test_rate_limit.py create mode 100644 tests/test_transform_data.py diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml index c7b9d39..b4cc85b 100644 --- a/.github/workflows/tests.yml +++ b/.github/workflows/tests.yml @@ -24,9 +24,9 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - - name: Run integration test with docker-compose + - name: Run integration test with docker compose run: | - docker-compose up --build --abort-on-container-exit --exit-code-from github-etl + docker compose up --build --abort-on-container-exit --exit-code-from github-etl - name: Cleanup if: always() - run: docker-compose down -v + run: docker compose down -v diff --git a/tests/test_extract_comments.py b/tests/test_extract_comments.py new file mode 100644 index 0000000..25232b3 --- /dev/null +++ b/tests/test_extract_comments.py @@ -0,0 +1,137 @@ +#!/usr/bin/env python3 +""" +Tests for extract_comments function. + +Tests comment extraction including endpoint verification, rate limiting, +and error handling. +""" + +from unittest.mock import Mock, patch + +import pytest + +import main + + +def test_extract_comments_basic(mock_session): + """Test basic extraction of comments.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [ + { + "id": 456, + "user": {"login": "commenter1"}, + "body": "This looks good", + "created_at": "2024-01-01T14:00:00Z", + }, + { + "id": 457, + "user": {"login": "commenter2"}, + "body": "I have concerns", + "created_at": "2024-01-01T15:00:00Z", + }, + ] + + mock_session.get.return_value = comments_response + + result = main.extract_comments(mock_session, "mozilla/firefox", 123) + + assert len(result) == 2 + assert result[0]["id"] == 456 + assert result[1]["id"] == 457 + + +def test_uses_issues_endpoint(mock_session): + """Test that comments use /issues endpoint not /pulls.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [] + + mock_session.get.return_value = comments_response + + main.extract_comments(mock_session, "mozilla/firefox", 123) + + call_args = mock_session.get.call_args + url = call_args[0][0] + assert "/issues/123/comments" in url + assert "/pulls/123/comments" not in url + + +def test_multiple_comments(mock_session): + """Test handling multiple comments.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [ + {"id": i, "user": {"login": f"user{i}"}, "body": f"Comment {i}"} + for i in range(1, 11) + ] + + mock_session.get.return_value = comments_response + + result = main.extract_comments(mock_session, "mozilla/firefox", 123) + + assert len(result) == 10 + + +def test_empty_comments_list(mock_session): + """Test handling PR with no comments.""" + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [] + + mock_session.get.return_value = comments_response + + result = main.extract_comments(mock_session, "mozilla/firefox", 123) + + assert result == [] + + +@patch("main.sleep_for_rate_limit") +def test_rate_limit_handling_comments(mock_sleep, mock_session): + """Test rate limit handling when fetching comments.""" + rate_limit_response = Mock() + rate_limit_response.status_code = 403 + rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} + + success_response = Mock() + success_response.status_code = 200 + success_response.json.return_value = [] + + mock_session.get.side_effect = [rate_limit_response, success_response] + + result = main.extract_comments(mock_session, "mozilla/firefox", 123) + + mock_sleep.assert_called_once() + assert result == [] + + +def test_api_error_comments(mock_session): + """Test API error handling when fetching comments.""" + error_response = Mock() + error_response.status_code = 404 + error_response.text = "Not Found" + + mock_session.get.return_value = error_response + + with pytest.raises(SystemExit) as exc_info: + main.extract_comments(mock_session, "mozilla/firefox", 123) + + assert "GitHub API error 404" in str(exc_info.value) + + +def test_custom_github_api_url_comments(mock_session): + """Test using custom GitHub API URL for comments.""" + custom_url = "https://mock-github.example.com" + + comments_response = Mock() + comments_response.status_code = 200 + comments_response.json.return_value = [] + + mock_session.get.return_value = comments_response + + main.extract_comments( + mock_session, "mozilla/firefox", 123, github_api_url=custom_url + ) + + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] diff --git a/tests/test_extract_commits.py b/tests/test_extract_commits.py new file mode 100644 index 0000000..bccc8b5 --- /dev/null +++ b/tests/test_extract_commits.py @@ -0,0 +1,190 @@ +#!/usr/bin/env python3 +""" +Tests for extract_commits function. + +Tests commit extraction including file details, rate limiting, and error handling. +""" + +from unittest.mock import Mock, patch + +import pytest + +import main + + +def test_extract_commits_with_files(mock_session): + """Test extracting commits with file details.""" + # Mock commits list response + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [ + {"sha": "abc123"}, + {"sha": "def456"}, + ] + + # Mock individual commit responses + commit_detail_1 = Mock() + commit_detail_1.status_code = 200 + commit_detail_1.json.return_value = { + "sha": "abc123", + "files": [{"filename": "file1.py", "additions": 10}], + } + + commit_detail_2 = Mock() + commit_detail_2.status_code = 200 + commit_detail_2.json.return_value = { + "sha": "def456", + "files": [{"filename": "file2.py", "deletions": 5}], + } + + mock_session.get.side_effect = [ + commits_response, + commit_detail_1, + commit_detail_2, + ] + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert len(result) == 2 + assert result[0]["sha"] == "abc123" + assert result[0]["files"][0]["filename"] == "file1.py" + assert result[1]["sha"] == "def456" + assert result[1]["files"][0]["filename"] == "file2.py" + + +def test_multiple_files_per_commit(mock_session): + """Test handling multiple files in a single commit.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [{"sha": "abc123"}] + + commit_detail = Mock() + commit_detail.status_code = 200 + commit_detail.json.return_value = { + "sha": "abc123", + "files": [ + {"filename": "file1.py", "additions": 10}, + {"filename": "file2.py", "additions": 20}, + {"filename": "file3.py", "deletions": 5}, + ], + } + + mock_session.get.side_effect = [commits_response, commit_detail] + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert len(result) == 1 + assert len(result[0]["files"]) == 3 + + +@patch("main.sleep_for_rate_limit") +def test_rate_limit_on_commits_list(mock_sleep, mock_session): + """Test rate limit handling when fetching commits list.""" + # Rate limit response + rate_limit_response = Mock() + rate_limit_response.status_code = 403 + rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} + + # Success response + success_response = Mock() + success_response.status_code = 200 + success_response.json.return_value = [] + + mock_session.get.side_effect = [rate_limit_response, success_response] + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + mock_sleep.assert_called_once() + assert result == [] + + +def test_api_error_on_commits_list(mock_session): + """Test API error handling when fetching commits list.""" + error_response = Mock() + error_response.status_code = 500 + error_response.text = "Internal Server Error" + + mock_session.get.return_value = error_response + + with pytest.raises(SystemExit) as exc_info: + main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert "GitHub API error 500" in str(exc_info.value) + + +def test_api_error_on_individual_commit(mock_session): + """Test API error when fetching individual commit details.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [{"sha": "abc123"}] + + commit_error = Mock() + commit_error.status_code = 404 + commit_error.text = "Commit not found" + + mock_session.get.side_effect = [commits_response, commit_error] + + with pytest.raises(SystemExit) as exc_info: + main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert "GitHub API error 404" in str(exc_info.value) + + +def test_commit_without_sha_field(mock_session): + """Test handling commits without sha field.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [ + {"sha": "abc123"}, + {}, # Missing sha field + ] + + commit_detail_1 = Mock() + commit_detail_1.status_code = 200 + commit_detail_1.json.return_value = {"sha": "abc123", "files": []} + + commit_detail_2 = Mock() + commit_detail_2.status_code = 200 + commit_detail_2.json.return_value = {"files": []} + + mock_session.get.side_effect = [ + commits_response, + commit_detail_1, + commit_detail_2, + ] + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + # Should handle the commit without sha gracefully + assert len(result) == 2 + + +def test_custom_github_api_url_commits(mock_session): + """Test using custom GitHub API URL for commits.""" + custom_url = "https://mock-github.example.com" + + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [] + + mock_session.get.return_value = commits_response + + main.extract_commits( + mock_session, "mozilla/firefox", 123, github_api_url=custom_url + ) + + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] + + +def test_empty_commits_list(mock_session): + """Test handling PR with no commits.""" + commits_response = Mock() + commits_response.status_code = 200 + commits_response.json.return_value = [] + + mock_session.get.return_value = commits_response + + result = main.extract_commits(mock_session, "mozilla/firefox", 123) + + assert result == [] diff --git a/tests/test_extract_pull_requests.py b/tests/test_extract_pull_requests.py new file mode 100644 index 0000000..b6325fb --- /dev/null +++ b/tests/test_extract_pull_requests.py @@ -0,0 +1,309 @@ +#!/usr/bin/env python3 +""" +Tests for extract_pull_requests function. + +Tests pull request extraction including pagination, rate limiting, error handling, +and enrichment with commits, reviewers, and comments. +""" + +from unittest.mock import Mock, patch + +import pytest + +import main + + +def test_extract_pull_requests_basic(mock_session): + """Test basic extraction of pull requests.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [ + {"number": 1, "title": "PR 1"}, + {"number": 2, "title": "PR 2"}, + ] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + # Mock the extract functions + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + assert len(result) == 1 + assert len(result[0]) == 2 + assert result[0][0]["number"] == 1 + assert result[0][1]["number"] == 2 + + +def test_extract_multiple_pages(mock_session): + """Test extracting data across multiple pages with pagination.""" + # First page response + mock_response_1 = Mock() + mock_response_1.status_code = 200 + mock_response_1.json.return_value = [ + {"number": 1, "title": "PR 1"}, + {"number": 2, "title": "PR 2"}, + ] + mock_response_1.links = { + "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} + } + + # Second page response + mock_response_2 = Mock() + mock_response_2.status_code = 200 + mock_response_2.json.return_value = [{"number": 3, "title": "PR 3"}] + mock_response_2.links = {} + + mock_session.get.side_effect = [mock_response_1, mock_response_2] + + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + assert len(result) == 2 + assert len(result[0]) == 2 + assert len(result[1]) == 1 + assert result[0][0]["number"] == 1 + assert result[1][0]["number"] == 3 + + +def test_enriches_prs_with_commit_data(mock_session): + """Test that PRs are enriched with commit data.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + mock_commits = [{"sha": "abc123"}] + + with ( + patch( + "main.extract_commits", return_value=mock_commits + ) as mock_extract_commits, + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + assert result[0][0]["commit_data"] == mock_commits + mock_extract_commits.assert_called_once() + + +def test_enriches_prs_with_reviewer_data(mock_session): + """Test that PRs are enriched with reviewer data.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + mock_reviewers = [{"id": 789, "state": "APPROVED"}] + + with ( + patch("main.extract_commits", return_value=[]), + patch( + "main.extract_reviewers", return_value=mock_reviewers + ) as mock_extract_reviewers, + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + assert result[0][0]["reviewer_data"] == mock_reviewers + mock_extract_reviewers.assert_called_once() + + +def test_enriches_prs_with_comment_data(mock_session): + """Test that PRs are enriched with comment data.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + mock_comments = [{"id": 456, "body": "Great work!"}] + + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch( + "main.extract_comments", return_value=mock_comments + ) as mock_extract_comments, + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + assert result[0][0]["comment_data"] == mock_comments + mock_extract_comments.assert_called_once() + + +@patch("main.sleep_for_rate_limit") +def test_handles_rate_limit(mock_sleep, mock_session): + """Test that extract_pull_requests handles rate limiting correctly.""" + # Rate limit response + mock_response_rate_limit = Mock() + mock_response_rate_limit.status_code = 403 + mock_response_rate_limit.headers = {"X-RateLimit-Remaining": "0"} + + # Successful response after rate limit + mock_response_success = Mock() + mock_response_success.status_code = 200 + mock_response_success.json.return_value = [{"number": 1, "title": "PR 1"}] + mock_response_success.links = {} + + mock_session.get.side_effect = [ + mock_response_rate_limit, + mock_response_success, + ] + + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + mock_sleep.assert_called_once_with(mock_response_rate_limit) + assert len(result) == 1 + + +def test_handles_api_error_404(mock_session): + """Test that extract_pull_requests raises SystemExit on 404.""" + mock_response = Mock() + mock_response.status_code = 404 + mock_response.text = "Not Found" + + mock_session.get.return_value = mock_response + + with pytest.raises(SystemExit) as exc_info: + list(main.extract_pull_requests(mock_session, "mozilla/nonexistent")) + + assert "GitHub API error 404" in str(exc_info.value) + + +def test_handles_api_error_500(mock_session): + """Test that extract_pull_requests raises SystemExit on 500.""" + mock_response = Mock() + mock_response.status_code = 500 + mock_response.text = "Internal Server Error" + + mock_session.get.return_value = mock_response + + with pytest.raises(SystemExit) as exc_info: + list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + assert "GitHub API error 500" in str(exc_info.value) + + +def test_stops_on_empty_batch(mock_session): + """Test that extraction stops when an empty batch is returned.""" + # First page with data + mock_response_1 = Mock() + mock_response_1.status_code = 200 + mock_response_1.json.return_value = [{"number": 1}] + mock_response_1.links = { + "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} + } + + # Second page empty + mock_response_2 = Mock() + mock_response_2.status_code = 200 + mock_response_2.json.return_value = [] + mock_response_2.links = {} + + mock_session.get.side_effect = [mock_response_1, mock_response_2] + + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + # Should only have 1 chunk from first page + assert len(result) == 1 + assert len(result[0]) == 1 + + +def test_invalid_page_number_handling(mock_session): + """Test handling of invalid page number in pagination.""" + mock_response_1 = Mock() + mock_response_1.status_code = 200 + mock_response_1.json.return_value = [{"number": 1}] + mock_response_1.links = { + "next": { + "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=invalid" + } + } + + mock_session.get.return_value = mock_response_1 + + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + # Should stop pagination on invalid page number + assert len(result) == 1 + + +def test_custom_github_api_url(mock_session): + """Test using custom GitHub API URL.""" + custom_url = "https://mock-github.example.com" + + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [{"number": 1}] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + with ( + patch("main.extract_commits", return_value=[]), + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + list( + main.extract_pull_requests( + mock_session, "mozilla/firefox", github_api_url=custom_url + ) + ) + + # Verify custom URL was used + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] + + +def test_skips_prs_without_number_field(mock_session): + """Test that PRs without 'number' field are skipped.""" + mock_response = Mock() + mock_response.status_code = 200 + mock_response.json.return_value = [ + {"number": 1, "title": "PR 1"}, + {"title": "PR without number"}, # Missing number field + {"number": 2, "title": "PR 2"}, + ] + mock_response.links = {} + + mock_session.get.return_value = mock_response + + with ( + patch("main.extract_commits", return_value=[]) as mock_commits, + patch("main.extract_reviewers", return_value=[]), + patch("main.extract_comments", return_value=[]), + ): + list(main.extract_pull_requests(mock_session, "mozilla/firefox")) + + # extract_commits should only be called for PRs with number field + assert mock_commits.call_count == 2 diff --git a/tests/test_extract_reviewers.py b/tests/test_extract_reviewers.py new file mode 100644 index 0000000..7df4b43 --- /dev/null +++ b/tests/test_extract_reviewers.py @@ -0,0 +1,127 @@ +#!/usr/bin/env python3 +""" +Tests for extract_reviewers function. + +Tests reviewer extraction including different review states, rate limiting, +and error handling. +""" + +from unittest.mock import Mock, patch + +import pytest + +import main + + +def test_extract_reviewers_basic(mock_session): + """Test basic extraction of reviewers.""" + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [ + { + "id": 789, + "user": {"login": "reviewer1"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T15:00:00Z", + }, + { + "id": 790, + "user": {"login": "reviewer2"}, + "state": "CHANGES_REQUESTED", + "submitted_at": "2024-01-01T16:00:00Z", + }, + ] + + mock_session.get.return_value = reviewers_response + + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) + + assert len(result) == 2 + assert result[0]["state"] == "APPROVED" + assert result[1]["state"] == "CHANGES_REQUESTED" + + +def test_multiple_review_states(mock_session): + """Test handling multiple different review states.""" + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [ + {"id": 1, "state": "APPROVED", "user": {"login": "user1"}}, + {"id": 2, "state": "CHANGES_REQUESTED", "user": {"login": "user2"}}, + {"id": 3, "state": "COMMENTED", "user": {"login": "user3"}}, + {"id": 4, "state": "DISMISSED", "user": {"login": "user4"}}, + ] + + mock_session.get.return_value = reviewers_response + + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) + + assert len(result) == 4 + states = [r["state"] for r in result] + assert "APPROVED" in states + assert "CHANGES_REQUESTED" in states + assert "COMMENTED" in states + + +def test_empty_reviewers_list(mock_session): + """Test handling PR with no reviewers.""" + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [] + + mock_session.get.return_value = reviewers_response + + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) + + assert result == [] + + +@patch("main.sleep_for_rate_limit") +def test_rate_limit_handling(mock_sleep, mock_session): + """Test rate limit handling when fetching reviewers.""" + rate_limit_response = Mock() + rate_limit_response.status_code = 403 + rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} + + success_response = Mock() + success_response.status_code = 200 + success_response.json.return_value = [] + + mock_session.get.side_effect = [rate_limit_response, success_response] + + result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) + + mock_sleep.assert_called_once() + assert result == [] + + +def test_api_error(mock_session): + """Test API error handling when fetching reviewers.""" + error_response = Mock() + error_response.status_code = 500 + error_response.text = "Internal Server Error" + + mock_session.get.return_value = error_response + + with pytest.raises(SystemExit) as exc_info: + main.extract_reviewers(mock_session, "mozilla/firefox", 123) + + assert "GitHub API error 500" in str(exc_info.value) + + +def test_custom_github_api_url_reviewers(mock_session): + """Test using custom GitHub API URL for reviewers.""" + custom_url = "https://mock-github.example.com" + + reviewers_response = Mock() + reviewers_response.status_code = 200 + reviewers_response.json.return_value = [] + + mock_session.get.return_value = reviewers_response + + main.extract_reviewers( + mock_session, "mozilla/firefox", 123, github_api_url=custom_url + ) + + call_args = mock_session.get.call_args + assert custom_url in call_args[0][0] diff --git a/tests/test_load_data.py b/tests/test_load_data.py new file mode 100644 index 0000000..0203288 --- /dev/null +++ b/tests/test_load_data.py @@ -0,0 +1,141 @@ +#!/usr/bin/env python3 +""" +Tests for load_data function. + +Tests BigQuery data loading including table insertion, snapshot dates, +and error handling. +""" + +from unittest.mock import patch + +import pytest + +import main + + +@patch("main.datetime") +def test_load_data_inserts_all_tables(mock_datetime, mock_bigquery_client): + """Test that load_data inserts all tables correctly.""" + mock_datetime.now.return_value.strftime.return_value = "2024-01-15" + + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [{"commit_sha": "abc"}], + "reviewers": [{"reviewer_username": "user1"}], + "comments": [{"comment_id": 123}], + } + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + # Should call insert_rows_json 4 times (once per table) + assert mock_bigquery_client.insert_rows_json.call_count == 4 + + +@patch("main.datetime") +def test_adds_snapshot_date(mock_datetime, mock_bigquery_client): + """Test that snapshot_date is added to all rows.""" + mock_datetime.now.return_value.strftime.return_value = "2024-01-15" + + transformed_data = { + "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}], + "commits": [], + "reviewers": [], + "comments": [], + } + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + call_args = mock_bigquery_client.insert_rows_json.call_args + rows = call_args[0][1] + assert all(row["snapshot_date"] == "2024-01-15" for row in rows) + + +def test_constructs_correct_table_ref(mock_bigquery_client): + """Test that table_ref is constructed correctly.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } + + main.load_data(mock_bigquery_client, "my_dataset", transformed_data) + + call_args = mock_bigquery_client.insert_rows_json.call_args + table_ref = call_args[0][0] + assert table_ref == "test-project.my_dataset.pull_requests" + + +def test_empty_transformed_data_skipped(mock_bigquery_client): + """Test that empty transformed_data dict is skipped.""" + transformed_data = {} + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + mock_bigquery_client.insert_rows_json.assert_not_called() + + +def test_skips_empty_tables_individually(mock_bigquery_client): + """Test that empty tables are skipped individually.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], # Empty, should be skipped + "reviewers": [], # Empty, should be skipped + "comments": [{"comment_id": 456}], + } + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + # Should only call insert_rows_json twice (for PRs and comments) + assert mock_bigquery_client.insert_rows_json.call_count == 2 + + +def test_only_pull_requests_table(mock_bigquery_client): + """Test loading only pull_requests table.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + assert mock_bigquery_client.insert_rows_json.call_count == 1 + + +def test_raises_exception_on_insert_errors(mock_bigquery_client): + """Test that Exception is raised on BigQuery insert errors.""" + mock_bigquery_client.insert_rows_json.return_value = [ + {"index": 0, "errors": ["Insert failed"]} + ] + + transformed_data = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } + + with pytest.raises(Exception) as exc_info: + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + assert "BigQuery insert errors" in str(exc_info.value) + + +def test_verifies_client_insert_called_correctly(mock_bigquery_client): + """Test that client.insert_rows_json is called with correct arguments.""" + transformed_data = { + "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}], + "commits": [], + "reviewers": [], + "comments": [], + } + + main.load_data(mock_bigquery_client, "test_dataset", transformed_data) + + call_args = mock_bigquery_client.insert_rows_json.call_args + table_ref, rows = call_args[0] + + assert "pull_requests" in table_ref + assert len(rows) == 2 diff --git a/tests/test_logging.py b/tests/test_logging.py new file mode 100644 index 0000000..10730d1 --- /dev/null +++ b/tests/test_logging.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +""" +Tests for setup_logging function. + +Tests logging configuration including log level and handler setup. +""" + +import logging + +import main + + +def test_setup_logging(): + """Test that setup_logging configures logging correctly.""" + main.setup_logging() + + root_logger = logging.getLogger() + assert root_logger.level == logging.INFO + assert len(root_logger.handlers) > 0 + + # Check that at least one handler is a StreamHandler + has_stream_handler = any( + isinstance(handler, logging.StreamHandler) for handler in root_logger.handlers + ) + assert has_stream_handler diff --git a/tests/test_main.py b/tests/test_main.py deleted file mode 100644 index 19ba7a4..0000000 --- a/tests/test_main.py +++ /dev/null @@ -1,2122 +0,0 @@ -#!/usr/bin/env python3 -""" -Comprehensive test suite for GitHub ETL main.py - -This test suite provides complete coverage for all functions in main.py, -including extraction, transformation, loading, and orchestration logic. -""" - -import logging -import os -from unittest.mock import MagicMock, Mock, patch - -import pytest - -import main - -# ============================================================================= -# TESTS FOR SETUP_LOGGING -# ============================================================================= - - -def test_setup_logging(): - """Test that setup_logging configures logging correctly.""" - main.setup_logging() - - root_logger = logging.getLogger() - assert root_logger.level == logging.INFO - assert len(root_logger.handlers) > 0 - - # Check that at least one handler is a StreamHandler - has_stream_handler = any( - isinstance(handler, logging.StreamHandler) for handler in root_logger.handlers - ) - assert has_stream_handler - - -# ============================================================================= -# TESTS FOR SLEEP_FOR_RATE_LIMIT -# ============================================================================= - - -@patch("time.time") -@patch("time.sleep") -def test_sleep_for_rate_limit_calculates_wait_time(mock_sleep, mock_time): - """Test that sleep_for_rate_limit calculates correct wait time.""" - mock_time.return_value = 1000 - - mock_response = Mock() - mock_response.headers = { - "X-RateLimit-Remaining": "0", - "X-RateLimit-Reset": "1120", # 120 seconds from now - } - - main.sleep_for_rate_limit(mock_response) - - mock_sleep.assert_called_once_with(120) - - -@patch("time.time") -@patch("time.sleep") -def test_sleep_for_rate_limit_when_reset_already_passed(mock_sleep, mock_time): - """Test that sleep_for_rate_limit doesn't sleep negative time.""" - mock_time.return_value = 2000 - - mock_response = Mock() - mock_response.headers = { - "X-RateLimit-Remaining": "0", - "X-RateLimit-Reset": "1500", # Already passed - } - - main.sleep_for_rate_limit(mock_response) - - # Should sleep for 0 seconds (max of 0 and negative value) - mock_sleep.assert_called_once_with(0) - - -@patch("time.sleep") -def test_sleep_for_rate_limit_when_remaining_not_zero(mock_sleep): - """Test that sleep_for_rate_limit doesn't sleep when remaining > 0.""" - mock_response = Mock() - mock_response.headers = { - "X-RateLimit-Remaining": "5", - "X-RateLimit-Reset": "1500", - } - - main.sleep_for_rate_limit(mock_response) - - # Should not sleep when remaining > 0 - mock_sleep.assert_not_called() - - -@patch("time.sleep") -def test_sleep_for_rate_limit_with_missing_headers(mock_sleep): - """Test sleep_for_rate_limit with missing rate limit headers.""" - mock_response = Mock() - mock_response.headers = {} - - main.sleep_for_rate_limit(mock_response) - - # Should not sleep when headers are missing (defaults to remaining=1) - mock_sleep.assert_not_called() - - -# ============================================================================= -# TESTS FOR EXTRACT_PULL_REQUESTS -# ============================================================================= - - -def test_extract_pull_requests_basic(mock_session): - """Test basic extraction of pull requests.""" - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [ - {"number": 1, "title": "PR 1"}, - {"number": 2, "title": "PR 2"}, - ] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - # Mock the extract functions - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert len(result) == 1 - assert len(result[0]) == 2 - assert result[0][0]["number"] == 1 - assert result[0][1]["number"] == 2 - - -def test_extract_multiple_pages(mock_session): - """Test extracting data across multiple pages with pagination.""" - # First page response - mock_response_1 = Mock() - mock_response_1.status_code = 200 - mock_response_1.json.return_value = [ - {"number": 1, "title": "PR 1"}, - {"number": 2, "title": "PR 2"}, - ] - mock_response_1.links = { - "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} - } - - # Second page response - mock_response_2 = Mock() - mock_response_2.status_code = 200 - mock_response_2.json.return_value = [{"number": 3, "title": "PR 3"}] - mock_response_2.links = {} - - mock_session.get.side_effect = [mock_response_1, mock_response_2] - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert len(result) == 2 - assert len(result[0]) == 2 - assert len(result[1]) == 1 - assert result[0][0]["number"] == 1 - assert result[1][0]["number"] == 3 - - -def test_enriches_prs_with_commit_data(mock_session): - """Test that PRs are enriched with commit data.""" - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - mock_commits = [{"sha": "abc123"}] - - with ( - patch( - "main.extract_commits", return_value=mock_commits - ) as mock_extract_commits, - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert result[0][0]["commit_data"] == mock_commits - mock_extract_commits.assert_called_once() - - -def test_enriches_prs_with_reviewer_data(mock_session): - """Test that PRs are enriched with reviewer data.""" - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - mock_reviewers = [{"id": 789, "state": "APPROVED"}] - - with ( - patch("main.extract_commits", return_value=[]), - patch( - "main.extract_reviewers", return_value=mock_reviewers - ) as mock_extract_reviewers, - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert result[0][0]["reviewer_data"] == mock_reviewers - mock_extract_reviewers.assert_called_once() - - -def test_enriches_prs_with_comment_data(mock_session): - """Test that PRs are enriched with comment data.""" - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [{"number": 1, "title": "PR 1"}] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - mock_comments = [{"id": 456, "body": "Great work!"}] - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch( - "main.extract_comments", return_value=mock_comments - ) as mock_extract_comments, - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert result[0][0]["comment_data"] == mock_comments - mock_extract_comments.assert_called_once() - - -@patch("main.sleep_for_rate_limit") -def test_handles_rate_limit(mock_sleep, mock_session): - """Test that extract_pull_requests handles rate limiting correctly.""" - # Rate limit response - mock_response_rate_limit = Mock() - mock_response_rate_limit.status_code = 403 - mock_response_rate_limit.headers = {"X-RateLimit-Remaining": "0"} - - # Successful response after rate limit - mock_response_success = Mock() - mock_response_success.status_code = 200 - mock_response_success.json.return_value = [{"number": 1, "title": "PR 1"}] - mock_response_success.links = {} - - mock_session.get.side_effect = [ - mock_response_rate_limit, - mock_response_success, - ] - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - mock_sleep.assert_called_once_with(mock_response_rate_limit) - assert len(result) == 1 - - -def test_handles_api_error_404(mock_session): - """Test that extract_pull_requests raises SystemExit on 404.""" - mock_response = Mock() - mock_response.status_code = 404 - mock_response.text = "Not Found" - - mock_session.get.return_value = mock_response - - with pytest.raises(SystemExit) as exc_info: - list(main.extract_pull_requests(mock_session, "mozilla/nonexistent")) - - assert "GitHub API error 404" in str(exc_info.value) - - -def test_handles_api_error_500(mock_session): - """Test that extract_pull_requests raises SystemExit on 500.""" - mock_response = Mock() - mock_response.status_code = 500 - mock_response.text = "Internal Server Error" - - mock_session.get.return_value = mock_response - - with pytest.raises(SystemExit) as exc_info: - list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - assert "GitHub API error 500" in str(exc_info.value) - - -def test_stops_on_empty_batch(mock_session): - """Test that extraction stops when an empty batch is returned.""" - # First page with data - mock_response_1 = Mock() - mock_response_1.status_code = 200 - mock_response_1.json.return_value = [{"number": 1}] - mock_response_1.links = { - "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} - } - - # Second page empty - mock_response_2 = Mock() - mock_response_2.status_code = 200 - mock_response_2.json.return_value = [] - mock_response_2.links = {} - - mock_session.get.side_effect = [mock_response_1, mock_response_2] - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - # Should only have 1 chunk from first page - assert len(result) == 1 - assert len(result[0]) == 1 - - -def test_invalid_page_number_handling(mock_session): - """Test handling of invalid page number in pagination.""" - mock_response_1 = Mock() - mock_response_1.status_code = 200 - mock_response_1.json.return_value = [{"number": 1}] - mock_response_1.links = { - "next": { - "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=invalid" - } - } - - mock_session.get.return_value = mock_response_1 - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - result = list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - # Should stop pagination on invalid page number - assert len(result) == 1 - - -def test_custom_github_api_url(mock_session): - """Test using custom GitHub API URL.""" - custom_url = "https://mock-github.example.com" - - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [{"number": 1}] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - with ( - patch("main.extract_commits", return_value=[]), - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - list( - main.extract_pull_requests( - mock_session, "mozilla/firefox", github_api_url=custom_url - ) - ) - - # Verify custom URL was used - call_args = mock_session.get.call_args - assert custom_url in call_args[0][0] - - -def test_skips_prs_without_number_field(mock_session): - """Test that PRs without 'number' field are skipped.""" - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [ - {"number": 1, "title": "PR 1"}, - {"title": "PR without number"}, # Missing number field - {"number": 2, "title": "PR 2"}, - ] - mock_response.links = {} - - mock_session.get.return_value = mock_response - - with ( - patch("main.extract_commits", return_value=[]) as mock_commits, - patch("main.extract_reviewers", return_value=[]), - patch("main.extract_comments", return_value=[]), - ): - list(main.extract_pull_requests(mock_session, "mozilla/firefox")) - - # extract_commits should only be called for PRs with number field - assert mock_commits.call_count == 2 - - -# ============================================================================= -# TESTS FOR EXTRACT_COMMITS -# ============================================================================= - - -def test_extract_commits_with_files(mock_session): - """Test extracting commits with file details.""" - # Mock commits list response - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [ - {"sha": "abc123"}, - {"sha": "def456"}, - ] - - # Mock individual commit responses - commit_detail_1 = Mock() - commit_detail_1.status_code = 200 - commit_detail_1.json.return_value = { - "sha": "abc123", - "files": [{"filename": "file1.py", "additions": 10}], - } - - commit_detail_2 = Mock() - commit_detail_2.status_code = 200 - commit_detail_2.json.return_value = { - "sha": "def456", - "files": [{"filename": "file2.py", "deletions": 5}], - } - - mock_session.get.side_effect = [ - commits_response, - commit_detail_1, - commit_detail_2, - ] - - result = main.extract_commits(mock_session, "mozilla/firefox", 123) - - assert len(result) == 2 - assert result[0]["sha"] == "abc123" - assert result[0]["files"][0]["filename"] == "file1.py" - assert result[1]["sha"] == "def456" - assert result[1]["files"][0]["filename"] == "file2.py" - - -def test_multiple_files_per_commit(mock_session): - """Test handling multiple files in a single commit.""" - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [{"sha": "abc123"}] - - commit_detail = Mock() - commit_detail.status_code = 200 - commit_detail.json.return_value = { - "sha": "abc123", - "files": [ - {"filename": "file1.py", "additions": 10}, - {"filename": "file2.py", "additions": 20}, - {"filename": "file3.py", "deletions": 5}, - ], - } - - mock_session.get.side_effect = [commits_response, commit_detail] - - result = main.extract_commits(mock_session, "mozilla/firefox", 123) - - assert len(result) == 1 - assert len(result[0]["files"]) == 3 - - -@patch("main.sleep_for_rate_limit") -def test_rate_limit_on_commits_list(mock_sleep, mock_session): - """Test rate limit handling when fetching commits list.""" - # Rate limit response - rate_limit_response = Mock() - rate_limit_response.status_code = 403 - rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} - - # Success response - success_response = Mock() - success_response.status_code = 200 - success_response.json.return_value = [] - - mock_session.get.side_effect = [rate_limit_response, success_response] - - result = main.extract_commits(mock_session, "mozilla/firefox", 123) - - mock_sleep.assert_called_once() - assert result == [] - - -def test_api_error_on_commits_list(mock_session): - """Test API error handling when fetching commits list.""" - error_response = Mock() - error_response.status_code = 500 - error_response.text = "Internal Server Error" - - mock_session.get.return_value = error_response - - with pytest.raises(SystemExit) as exc_info: - main.extract_commits(mock_session, "mozilla/firefox", 123) - - assert "GitHub API error 500" in str(exc_info.value) - - -def test_api_error_on_individual_commit(mock_session): - """Test API error when fetching individual commit details.""" - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [{"sha": "abc123"}] - - commit_error = Mock() - commit_error.status_code = 404 - commit_error.text = "Commit not found" - - mock_session.get.side_effect = [commits_response, commit_error] - - with pytest.raises(SystemExit) as exc_info: - main.extract_commits(mock_session, "mozilla/firefox", 123) - - assert "GitHub API error 404" in str(exc_info.value) - - -def test_commit_without_sha_field(mock_session): - """Test handling commits without sha field.""" - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [ - {"sha": "abc123"}, - {}, # Missing sha field - ] - - commit_detail_1 = Mock() - commit_detail_1.status_code = 200 - commit_detail_1.json.return_value = {"sha": "abc123", "files": []} - - commit_detail_2 = Mock() - commit_detail_2.status_code = 200 - commit_detail_2.json.return_value = {"files": []} - - mock_session.get.side_effect = [ - commits_response, - commit_detail_1, - commit_detail_2, - ] - - result = main.extract_commits(mock_session, "mozilla/firefox", 123) - - # Should handle the commit without sha gracefully - assert len(result) == 2 - - -def test_custom_github_api_url_commits(mock_session): - """Test using custom GitHub API URL for commits.""" - custom_url = "https://mock-github.example.com" - - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [] - - mock_session.get.return_value = commits_response - - main.extract_commits( - mock_session, "mozilla/firefox", 123, github_api_url=custom_url - ) - - call_args = mock_session.get.call_args - assert custom_url in call_args[0][0] - - -def test_empty_commits_list(mock_session): - """Test handling PR with no commits.""" - commits_response = Mock() - commits_response.status_code = 200 - commits_response.json.return_value = [] - - mock_session.get.return_value = commits_response - - result = main.extract_commits(mock_session, "mozilla/firefox", 123) - - assert result == [] - - -# ============================================================================= -# TESTS FOR EXTRACT_REVIEWERS -# ============================================================================= - - -def test_extract_reviewers_basic(mock_session): - """Test basic extraction of reviewers.""" - reviewers_response = Mock() - reviewers_response.status_code = 200 - reviewers_response.json.return_value = [ - { - "id": 789, - "user": {"login": "reviewer1"}, - "state": "APPROVED", - "submitted_at": "2024-01-01T15:00:00Z", - }, - { - "id": 790, - "user": {"login": "reviewer2"}, - "state": "CHANGES_REQUESTED", - "submitted_at": "2024-01-01T16:00:00Z", - }, - ] - - mock_session.get.return_value = reviewers_response - - result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) - - assert len(result) == 2 - assert result[0]["state"] == "APPROVED" - assert result[1]["state"] == "CHANGES_REQUESTED" - - -def test_multiple_review_states(mock_session): - """Test handling multiple different review states.""" - reviewers_response = Mock() - reviewers_response.status_code = 200 - reviewers_response.json.return_value = [ - {"id": 1, "state": "APPROVED", "user": {"login": "user1"}}, - {"id": 2, "state": "CHANGES_REQUESTED", "user": {"login": "user2"}}, - {"id": 3, "state": "COMMENTED", "user": {"login": "user3"}}, - {"id": 4, "state": "DISMISSED", "user": {"login": "user4"}}, - ] - - mock_session.get.return_value = reviewers_response - - result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) - - assert len(result) == 4 - states = [r["state"] for r in result] - assert "APPROVED" in states - assert "CHANGES_REQUESTED" in states - assert "COMMENTED" in states - - -def test_empty_reviewers_list(mock_session): - """Test handling PR with no reviewers.""" - reviewers_response = Mock() - reviewers_response.status_code = 200 - reviewers_response.json.return_value = [] - - mock_session.get.return_value = reviewers_response - - result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) - - assert result == [] - - -@patch("main.sleep_for_rate_limit") -def test_rate_limit_handling(mock_sleep, mock_session): - """Test rate limit handling when fetching reviewers.""" - rate_limit_response = Mock() - rate_limit_response.status_code = 403 - rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} - - success_response = Mock() - success_response.status_code = 200 - success_response.json.return_value = [] - - mock_session.get.side_effect = [rate_limit_response, success_response] - - result = main.extract_reviewers(mock_session, "mozilla/firefox", 123) - - mock_sleep.assert_called_once() - assert result == [] - - -def test_api_error(mock_session): - """Test API error handling when fetching reviewers.""" - error_response = Mock() - error_response.status_code = 500 - error_response.text = "Internal Server Error" - - mock_session.get.return_value = error_response - - with pytest.raises(SystemExit) as exc_info: - main.extract_reviewers(mock_session, "mozilla/firefox", 123) - - assert "GitHub API error 500" in str(exc_info.value) - - -def test_custom_github_api_url_reviewers(mock_session): - """Test using custom GitHub API URL for reviewers.""" - custom_url = "https://mock-github.example.com" - - reviewers_response = Mock() - reviewers_response.status_code = 200 - reviewers_response.json.return_value = [] - - mock_session.get.return_value = reviewers_response - - main.extract_reviewers( - mock_session, "mozilla/firefox", 123, github_api_url=custom_url - ) - - call_args = mock_session.get.call_args - assert custom_url in call_args[0][0] - - -# ============================================================================= -# TESTS FOR EXTRACT_COMMENTS -# ============================================================================= - - -def test_extract_comments_basic(mock_session): - """Test basic extraction of comments.""" - comments_response = Mock() - comments_response.status_code = 200 - comments_response.json.return_value = [ - { - "id": 456, - "user": {"login": "commenter1"}, - "body": "This looks good", - "created_at": "2024-01-01T14:00:00Z", - }, - { - "id": 457, - "user": {"login": "commenter2"}, - "body": "I have concerns", - "created_at": "2024-01-01T15:00:00Z", - }, - ] - - mock_session.get.return_value = comments_response - - result = main.extract_comments(mock_session, "mozilla/firefox", 123) - - assert len(result) == 2 - assert result[0]["id"] == 456 - assert result[1]["id"] == 457 - - -def test_uses_issues_endpoint(mock_session): - """Test that comments use /issues endpoint not /pulls.""" - comments_response = Mock() - comments_response.status_code = 200 - comments_response.json.return_value = [] - - mock_session.get.return_value = comments_response - - main.extract_comments(mock_session, "mozilla/firefox", 123) - - call_args = mock_session.get.call_args - url = call_args[0][0] - assert "/issues/123/comments" in url - assert "/pulls/123/comments" not in url - - -def test_multiple_comments(mock_session): - """Test handling multiple comments.""" - comments_response = Mock() - comments_response.status_code = 200 - comments_response.json.return_value = [ - {"id": i, "user": {"login": f"user{i}"}, "body": f"Comment {i}"} - for i in range(1, 11) - ] - - mock_session.get.return_value = comments_response - - result = main.extract_comments(mock_session, "mozilla/firefox", 123) - - assert len(result) == 10 - - -def test_empty_comments_list(mock_session): - """Test handling PR with no comments.""" - comments_response = Mock() - comments_response.status_code = 200 - comments_response.json.return_value = [] - - mock_session.get.return_value = comments_response - - result = main.extract_comments(mock_session, "mozilla/firefox", 123) - - assert result == [] - - -@patch("main.sleep_for_rate_limit") -def test_rate_limit_handling_comments(mock_sleep, mock_session): - """Test rate limit handling when fetching comments.""" - rate_limit_response = Mock() - rate_limit_response.status_code = 403 - rate_limit_response.headers = {"X-RateLimit-Remaining": "0"} - - success_response = Mock() - success_response.status_code = 200 - success_response.json.return_value = [] - - mock_session.get.side_effect = [rate_limit_response, success_response] - - result = main.extract_comments(mock_session, "mozilla/firefox", 123) - - mock_sleep.assert_called_once() - assert result == [] - - -def test_api_error_comments(mock_session): - """Test API error handling when fetching comments.""" - error_response = Mock() - error_response.status_code = 404 - error_response.text = "Not Found" - - mock_session.get.return_value = error_response - - with pytest.raises(SystemExit) as exc_info: - main.extract_comments(mock_session, "mozilla/firefox", 123) - - assert "GitHub API error 404" in str(exc_info.value) - - -def test_custom_github_api_url_comments(mock_session): - """Test using custom GitHub API URL for comments.""" - custom_url = "https://mock-github.example.com" - - comments_response = Mock() - comments_response.status_code = 200 - comments_response.json.return_value = [] - - mock_session.get.return_value = comments_response - - main.extract_comments( - mock_session, "mozilla/firefox", 123, github_api_url=custom_url - ) - - call_args = mock_session.get.call_args - assert custom_url in call_args[0][0] - - -# ============================================================================= -# TESTS FOR TRANSFORM_DATA -# ============================================================================= - - -def test_transform_data_basic(): - """Test basic transformation of pull request data.""" - raw_data = [ - { - "number": 123, - "title": "Fix login bug", - "state": "closed", - "created_at": "2024-01-01T10:00:00Z", - "updated_at": "2024-01-02T10:00:00Z", - "merged_at": "2024-01-02T12:00:00Z", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - assert len(result["pull_requests"]) == 1 - pr = result["pull_requests"][0] - assert pr["pull_request_id"] == 123 - assert pr["current_status"] == "closed" - assert pr["date_created"] == "2024-01-01T10:00:00Z" - assert pr["date_modified"] == "2024-01-02T10:00:00Z" - assert pr["date_landed"] == "2024-01-02T12:00:00Z" - assert pr["target_repository"] == "mozilla/firefox" - - -def test_bug_id_extraction_basic(): - """Test bug ID extraction from PR title.""" - test_cases = [ - ("Bug 1234567 - Fix issue", 1234567), - ("bug 1234567: Update code", 1234567), - ("Fix for bug 7654321", 7654321), - ("b=9876543 - Change behavior", 9876543), - ] - - for title, expected_bug_id in test_cases: - raw_data = [ - { - "number": 1, - "title": title, - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["bug_id"] == expected_bug_id - - -def test_bug_id_extraction_with_hash(): - """Test bug ID extraction with # symbol.""" - raw_data = [ - { - "number": 1, - "title": "Bug #1234567 - Fix issue", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["bug_id"] == 1234567 - - -def test_bug_id_filter_large_numbers(): - """Test that bug IDs >= 100000000 are filtered out.""" - raw_data = [ - { - "number": 1, - "title": "Bug 999999999 - Invalid bug ID", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["bug_id"] is None - - -def test_bug_id_no_match(): - """Test PR title with no bug ID.""" - raw_data = [ - { - "number": 1, - "title": "Update documentation", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["bug_id"] is None - - -def test_labels_extraction(): - """Test labels array extraction.""" - raw_data = [ - { - "number": 1, - "title": "PR with labels", - "state": "open", - "labels": [ - {"name": "bug"}, - {"name": "priority-high"}, - {"name": "needs-review"}, - ], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - labels = result["pull_requests"][0]["labels"] - assert len(labels) == 3 - assert "bug" in labels - assert "priority-high" in labels - assert "needs-review" in labels - - -def test_labels_empty_list(): - """Test handling empty labels list.""" - raw_data = [ - { - "number": 1, - "title": "PR without labels", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - assert result["pull_requests"][0]["labels"] == [] - - -def test_commit_transformation(): - """Test commit fields mapping.""" - raw_data = [ - { - "number": 123, - "title": "PR with commits", - "state": "open", - "labels": [], - "commit_data": [ - { - "sha": "abc123", - "commit": { - "author": { - "name": "Test Author", - "date": "2024-01-01T12:00:00Z", - } - }, - "files": [ - { - "filename": "src/main.py", - "additions": 10, - "deletions": 5, - } - ], - } - ], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - assert len(result["commits"]) == 1 - commit = result["commits"][0] - assert commit["pull_request_id"] == 123 - assert commit["target_repository"] == "mozilla/firefox" - assert commit["commit_sha"] == "abc123" - assert commit["date_created"] == "2024-01-01T12:00:00Z" - assert commit["author_username"] == "Test Author" - assert commit["filename"] == "src/main.py" - assert commit["lines_added"] == 10 - assert commit["lines_removed"] == 5 - - -def test_commit_file_flattening(): - """Test that each file becomes a separate row.""" - raw_data = [ - { - "number": 123, - "title": "PR with multiple files", - "state": "open", - "labels": [], - "commit_data": [ - { - "sha": "abc123", - "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, - "files": [ - {"filename": "file1.py", "additions": 10, "deletions": 5}, - {"filename": "file2.py", "additions": 20, "deletions": 2}, - {"filename": "file3.py", "additions": 5, "deletions": 15}, - ], - } - ], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - # Should have 3 rows in commits table (one per file) - assert len(result["commits"]) == 3 - filenames = [c["filename"] for c in result["commits"]] - assert "file1.py" in filenames - assert "file2.py" in filenames - assert "file3.py" in filenames - - -def test_multiple_commits_with_files(): - """Test multiple commits with multiple files per PR.""" - raw_data = [ - { - "number": 123, - "title": "PR with multiple commits", - "state": "open", - "labels": [], - "commit_data": [ - { - "sha": "commit1", - "commit": {"author": {"name": "Author1", "date": "2024-01-01"}}, - "files": [ - {"filename": "file1.py", "additions": 10, "deletions": 0} - ], - }, - { - "sha": "commit2", - "commit": {"author": {"name": "Author2", "date": "2024-01-02"}}, - "files": [ - {"filename": "file2.py", "additions": 5, "deletions": 2}, - {"filename": "file3.py", "additions": 8, "deletions": 3}, - ], - }, - ], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - # Should have 3 rows total (1 file from commit1, 2 files from commit2) - assert len(result["commits"]) == 3 - assert result["commits"][0]["commit_sha"] == "commit1" - assert result["commits"][1]["commit_sha"] == "commit2" - assert result["commits"][2]["commit_sha"] == "commit2" - - -def test_reviewer_transformation(): - """Test reviewer fields mapping.""" - raw_data = [ - { - "number": 123, - "title": "PR with reviewers", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [ - { - "id": 789, - "user": {"login": "reviewer1"}, - "state": "APPROVED", - "submitted_at": "2024-01-01T15:00:00Z", - } - ], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - assert len(result["reviewers"]) == 1 - reviewer = result["reviewers"][0] - assert reviewer["pull_request_id"] == 123 - assert reviewer["target_repository"] == "mozilla/firefox" - assert reviewer["reviewer_username"] == "reviewer1" - assert reviewer["status"] == "APPROVED" - assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z" - - -def test_transform_multiple_review_states(): - """Test transforming data with multiple review states.""" - raw_data = [ - { - "number": 123, - "title": "PR with multiple reviews", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [ - { - "id": 1, - "user": {"login": "user1"}, - "state": "APPROVED", - "submitted_at": "2024-01-01T15:00:00Z", - }, - { - "id": 2, - "user": {"login": "user2"}, - "state": "CHANGES_REQUESTED", - "submitted_at": "2024-01-01T16:00:00Z", - }, - { - "id": 3, - "user": {"login": "user3"}, - "state": "COMMENTED", - "submitted_at": "2024-01-01T17:00:00Z", - }, - ], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - assert len(result["reviewers"]) == 3 - states = [r["status"] for r in result["reviewers"]] - assert "APPROVED" in states - assert "CHANGES_REQUESTED" in states - assert "COMMENTED" in states - - -def test_date_approved_from_earliest_approval(): - """Test that date_approved is set to earliest APPROVED review.""" - raw_data = [ - { - "number": 123, - "title": "PR with multiple approvals", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [ - { - "id": 1, - "user": {"login": "user1"}, - "state": "APPROVED", - "submitted_at": "2024-01-02T15:00:00Z", - }, - { - "id": 2, - "user": {"login": "user2"}, - "state": "APPROVED", - "submitted_at": "2024-01-01T14:00:00Z", # Earliest - }, - { - "id": 3, - "user": {"login": "user3"}, - "state": "APPROVED", - "submitted_at": "2024-01-03T16:00:00Z", - }, - ], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - pr = result["pull_requests"][0] - assert pr["date_approved"] == "2024-01-01T14:00:00Z" - - -def test_comment_transformation(): - """Test comment fields mapping.""" - raw_data = [ - { - "number": 123, - "title": "PR with comments", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [ - { - "id": 456, - "user": {"login": "commenter1"}, - "body": "This looks great!", - "created_at": "2024-01-01T14:00:00Z", - "pull_request_review_id": None, - } - ], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - assert len(result["comments"]) == 1 - comment = result["comments"][0] - assert comment["pull_request_id"] == 123 - assert comment["target_repository"] == "mozilla/firefox" - assert comment["comment_id"] == 456 - assert comment["author_username"] == "commenter1" - assert comment["date_created"] == "2024-01-01T14:00:00Z" - assert comment["character_count"] == 17 - - -def test_comment_character_count(): - """Test character count calculation for comments.""" - raw_data = [ - { - "number": 123, - "title": "PR", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [ - { - "id": 1, - "user": {"login": "user1"}, - "body": "Short", - "created_at": "2024-01-01", - }, - { - "id": 2, - "user": {"login": "user2"}, - "body": "This is a much longer comment with more text", - "created_at": "2024-01-01", - }, - ], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - assert result["comments"][0]["character_count"] == 5 - assert result["comments"][1]["character_count"] == 44 - - -def test_comment_status_from_review(): - """Test that comment status is mapped from review_id_statuses.""" - raw_data = [ - { - "number": 123, - "title": "PR", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [ - { - "id": 789, - "user": {"login": "reviewer"}, - "state": "APPROVED", - "submitted_at": "2024-01-01", - } - ], - "comment_data": [ - { - "id": 456, - "user": {"login": "commenter"}, - "body": "LGTM", - "created_at": "2024-01-01", - "pull_request_review_id": 789, - } - ], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - # Comment should have status from the review - assert result["comments"][0]["status"] == "APPROVED" - - -def test_comment_empty_body(): - """Test handling comments with empty or None body.""" - raw_data = [ - { - "number": 123, - "title": "PR", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [ - { - "id": 1, - "user": {"login": "user1"}, - "body": None, - "created_at": "2024-01-01", - }, - { - "id": 2, - "user": {"login": "user2"}, - "body": "", - "created_at": "2024-01-01", - }, - ], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - assert result["comments"][0]["character_count"] == 0 - assert result["comments"][1]["character_count"] == 0 - - -def test_empty_raw_data(): - """Test handling empty input list.""" - result = main.transform_data([], "mozilla/firefox") - - assert result["pull_requests"] == [] - assert result["commits"] == [] - assert result["reviewers"] == [] - assert result["comments"] == [] - - -def test_pr_without_commits_reviewers_comments(): - """Test PR with no commits, reviewers, or comments.""" - raw_data = [ - { - "number": 123, - "title": "Minimal PR", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - assert len(result["pull_requests"]) == 1 - assert len(result["commits"]) == 0 - assert len(result["reviewers"]) == 0 - assert len(result["comments"]) == 0 - - -def test_return_structure(): - """Test that transform_data returns dict with 4 keys.""" - raw_data = [ - { - "number": 1, - "title": "Test", - "state": "open", - "labels": [], - "commit_data": [], - "reviewer_data": [], - "comment_data": [], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - assert isinstance(result, dict) - assert "pull_requests" in result - assert "commits" in result - assert "reviewers" in result - assert "comments" in result - - -def test_all_tables_have_target_repository(): - """Test that all tables include target_repository field.""" - raw_data = [ - { - "number": 123, - "title": "Test PR", - "state": "open", - "labels": [], - "commit_data": [ - { - "sha": "abc", - "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, - "files": [{"filename": "test.py", "additions": 1, "deletions": 0}], - } - ], - "reviewer_data": [ - { - "id": 1, - "user": {"login": "reviewer"}, - "state": "APPROVED", - "submitted_at": "2024-01-01", - } - ], - "comment_data": [ - { - "id": 2, - "user": {"login": "commenter"}, - "body": "Test", - "created_at": "2024-01-01", - } - ], - } - ] - - result = main.transform_data(raw_data, "mozilla/firefox") - - assert result["pull_requests"][0]["target_repository"] == "mozilla/firefox" - assert result["commits"][0]["target_repository"] == "mozilla/firefox" - assert result["reviewers"][0]["target_repository"] == "mozilla/firefox" - assert result["comments"][0]["target_repository"] == "mozilla/firefox" - - -# ============================================================================= -# TESTS FOR LOAD_DATA -# ============================================================================= - - -@patch("main.datetime") -def test_load_data_inserts_all_tables(mock_datetime, mock_bigquery_client): - """Test that load_data inserts all tables correctly.""" - mock_datetime.now.return_value.strftime.return_value = "2024-01-15" - - transformed_data = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [{"commit_sha": "abc"}], - "reviewers": [{"reviewer_username": "user1"}], - "comments": [{"comment_id": 123}], - } - - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - - # Should call insert_rows_json 4 times (once per table) - assert mock_bigquery_client.insert_rows_json.call_count == 4 - - -@patch("main.datetime") -def test_adds_snapshot_date(mock_datetime, mock_bigquery_client): - """Test that snapshot_date is added to all rows.""" - mock_datetime.now.return_value.strftime.return_value = "2024-01-15" - - transformed_data = { - "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}], - "commits": [], - "reviewers": [], - "comments": [], - } - - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - - call_args = mock_bigquery_client.insert_rows_json.call_args - rows = call_args[0][1] - assert all(row["snapshot_date"] == "2024-01-15" for row in rows) - - -def test_constructs_correct_table_ref(mock_bigquery_client): - """Test that table_ref is constructed correctly.""" - transformed_data = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } - - main.load_data(mock_bigquery_client, "my_dataset", transformed_data) - - call_args = mock_bigquery_client.insert_rows_json.call_args - table_ref = call_args[0][0] - assert table_ref == "test-project.my_dataset.pull_requests" - - -def test_empty_transformed_data_skipped(mock_bigquery_client): - """Test that empty transformed_data dict is skipped.""" - transformed_data = {} - - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - - mock_bigquery_client.insert_rows_json.assert_not_called() - - -def test_skips_empty_tables_individually(mock_bigquery_client): - """Test that empty tables are skipped individually.""" - transformed_data = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], # Empty, should be skipped - "reviewers": [], # Empty, should be skipped - "comments": [{"comment_id": 456}], - } - - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - - # Should only call insert_rows_json twice (for PRs and comments) - assert mock_bigquery_client.insert_rows_json.call_count == 2 - - -def test_only_pull_requests_table(mock_bigquery_client): - """Test loading only pull_requests table.""" - transformed_data = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } - - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - - assert mock_bigquery_client.insert_rows_json.call_count == 1 - - -def test_raises_exception_on_insert_errors(mock_bigquery_client): - """Test that Exception is raised on BigQuery insert errors.""" - mock_bigquery_client.insert_rows_json.return_value = [ - {"index": 0, "errors": ["Insert failed"]} - ] - - transformed_data = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } - - with pytest.raises(Exception) as exc_info: - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - - assert "BigQuery insert errors" in str(exc_info.value) - - -def test_verifies_client_insert_called_correctly(mock_bigquery_client): - """Test that client.insert_rows_json is called with correct arguments.""" - transformed_data = { - "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}], - "commits": [], - "reviewers": [], - "comments": [], - } - - main.load_data(mock_bigquery_client, "test_dataset", transformed_data) - - call_args = mock_bigquery_client.insert_rows_json.call_args - table_ref, rows = call_args[0] - - assert "pull_requests" in table_ref - assert len(rows) == 2 - - -# ============================================================================= -# TESTS FOR MAIN -# ============================================================================= - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_requires_github_repos(mock_session_class, mock_bq_client, mock_setup_logging): - """Test that GITHUB_REPOS is required.""" - with patch.dict( - os.environ, - {"BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test"}, - clear=True, - ): - with pytest.raises(SystemExit) as exc_info: - main.main() - - assert "GITHUB_REPOS" in str(exc_info.value) - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_requires_bigquery_project( - mock_session_class, mock_bq_client, mock_setup_logging -): - """Test that BIGQUERY_PROJECT is required.""" - with patch.dict( - os.environ, - {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"}, - clear=True, - ): - with pytest.raises(SystemExit) as exc_info: - main.main() - - assert "BIGQUERY_PROJECT" in str(exc_info.value) - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_requires_bigquery_dataset( - mock_session_class, mock_bq_client, mock_setup_logging -): - """Test that BIGQUERY_DATASET is required.""" - with patch.dict( - os.environ, - {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"}, - clear=True, - ): - with pytest.raises(SystemExit) as exc_info: - main.main() - - assert "BIGQUERY_DATASET" in str(exc_info.value) - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_github_token_optional_with_warning( - mock_session_class, mock_bq_client, mock_setup_logging -): - """Test that GITHUB_TOKEN is optional but warns if missing.""" - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])), - ): - # Should not raise, but should log warning - result = main.main() - assert result == 0 - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_splits_github_repos_by_comma( - mock_session_class, mock_bq_client, mock_setup_logging -): - """Test that GITHUB_REPOS is split by comma.""" - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])) as mock_extract, - ): - main.main() - - # Should be called twice (once per repo) - assert mock_extract.call_count == 2 - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_honors_github_api_url(mock_session_class, mock_bq_client, mock_setup_logging): - """Test that GITHUB_API_URL is honored.""" - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - "GITHUB_API_URL": "https://custom-api.example.com", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])) as mock_extract, - ): - main.main() - - call_kwargs = mock_extract.call_args[1] - assert call_kwargs["github_api_url"] == "https://custom-api.example.com" - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_honors_bigquery_emulator_host( - mock_session_class, mock_bq_client_class, mock_setup_logging -): - """Test that BIGQUERY_EMULATOR_HOST is honored.""" - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - "BIGQUERY_EMULATOR_HOST": "http://localhost:9050", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])), - ): - main.main() - - # Verify BigQuery client was created with emulator settings - mock_bq_client_class.assert_called_once() - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_creates_session_with_headers( - mock_session_class, mock_bq_client, mock_setup_logging -): - """Test that session is created with Accept and User-Agent headers.""" - mock_session = MagicMock() - mock_session_class.return_value = mock_session - - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])), - ): - main.main() - - # Verify session headers were set - assert mock_session.headers.update.called - call_args = mock_session.headers.update.call_args[0][0] - assert "Accept" in call_args - assert "User-Agent" in call_args - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_sets_authorization_header_with_token( - mock_session_class, mock_bq_client, mock_setup_logging -): - """Test that Authorization header is set when token provided.""" - mock_session = MagicMock() - mock_session_class.return_value = mock_session - - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "test-token-123", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])), - ): - main.main() - - # Verify Authorization header was set - assert mock_session.headers.__setitem__.called - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -@patch("main.extract_pull_requests") -@patch("main.transform_data") -@patch("main.load_data") -def test_single_repo_successful_etl( - mock_load, - mock_transform, - mock_extract, - mock_session_class, - mock_bq_client, - mock_setup_logging, -): - """Test successful ETL for single repository.""" - mock_extract.return_value = iter([[{"number": 1}]]) - mock_transform.return_value = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } - - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ): - result = main.main() - - assert result == 0 - mock_extract.assert_called_once() - mock_transform.assert_called_once() - mock_load.assert_called_once() - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -@patch("main.extract_pull_requests") -@patch("main.transform_data") -@patch("main.load_data") -def test_multiple_repos_processing( - mock_load, - mock_transform, - mock_extract, - mock_session_class, - mock_bq_client, - mock_setup_logging, -): - """Test processing multiple repositories.""" - mock_extract.return_value = iter([[{"number": 1}]]) - mock_transform.return_value = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } - - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev,mozilla/addons", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ): - result = main.main() - - assert result == 0 - # Should process 3 repositories - assert mock_extract.call_count == 3 - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -@patch("main.extract_pull_requests") -@patch("main.transform_data") -@patch("main.load_data") -def test_processes_chunks_iteratively( - mock_load, - mock_transform, - mock_extract, - mock_session_class, - mock_bq_client, - mock_setup_logging, -): - """Test that chunks are processed iteratively from generator.""" - # Return 3 chunks - mock_extract.return_value = iter( - [ - [{"number": 1}], - [{"number": 2}], - [{"number": 3}], - ] - ) - mock_transform.return_value = { - "pull_requests": [{"pull_request_id": 1}], - "commits": [], - "reviewers": [], - "comments": [], - } - - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ): - result = main.main() - - assert result == 0 - # Transform and load should be called 3 times (once per chunk) - assert mock_transform.call_count == 3 - assert mock_load.call_count == 3 - - -@patch("main.setup_logging") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_returns_zero_on_success( - mock_session_class, mock_bq_client, mock_setup_logging -): - """Test that main returns 0 on success.""" - with ( - patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ), - patch("main.extract_pull_requests", return_value=iter([])), - ): - result = main.main() - - assert result == 0 - - -@pytest.mark.integration -@patch("main.setup_logging") -@patch("main.load_data") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_full_etl_flow_transforms_data_correctly( - mock_session_class, mock_bq_client, mock_load, mock_setup_logging -): - """Test full ETL flow with mocked GitHub responses.""" - mock_session = MagicMock() - mock_session_class.return_value = mock_session - - # Mock PR response - pr_response = Mock() - pr_response.status_code = 200 - pr_response.json.return_value = [ - {"number": 1, "title": "Bug 1234567 - Test PR", "state": "open"} - ] - pr_response.links = {} - - # Mock commits, reviewers, comments responses - empty_response = Mock() - empty_response.status_code = 200 - empty_response.json.return_value = [] - - mock_session.get.side_effect = [ - pr_response, - empty_response, - empty_response, - empty_response, - ] - - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ): - result = main.main() - - assert result == 0 - mock_load.assert_called_once() - - # Verify transformed data structure - call_args = mock_load.call_args[0] - transformed_data = call_args[2] - assert "pull_requests" in transformed_data - assert len(transformed_data["pull_requests"]) == 1 - - -@patch("main.setup_logging") -@patch("main.load_data") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_bug_id_extraction_through_pipeline( - mock_session_class, mock_bq_client, mock_load, mock_setup_logging -): - """Test bug ID extraction through full pipeline.""" - mock_session = MagicMock() - mock_session_class.return_value = mock_session - - pr_response = Mock() - pr_response.status_code = 200 - pr_response.json.return_value = [ - { - "number": 1, - "title": "Bug 9876543 - Fix critical issue", - "state": "closed", - } - ] - pr_response.links = {} - - empty_response = Mock() - empty_response.status_code = 200 - empty_response.json.return_value = [] - - mock_session.get.side_effect = [ - pr_response, - empty_response, - empty_response, - empty_response, - ] - - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ): - main.main() - - call_args = mock_load.call_args[0] - transformed_data = call_args[2] - pr = transformed_data["pull_requests"][0] - assert pr["bug_id"] == 9876543 - - -@patch("main.setup_logging") -@patch("main.load_data") -@patch("main.bigquery.Client") -@patch("requests.Session") -def test_pagination_through_full_flow( - mock_session_class, mock_bq_client, mock_load, mock_setup_logging -): - """Test pagination through full ETL flow.""" - mock_session = MagicMock() - mock_session_class.return_value = mock_session - - # First page - pr_response_1 = Mock() - pr_response_1.status_code = 200 - pr_response_1.json.return_value = [{"number": 1, "title": "PR 1", "state": "open"}] - pr_response_1.links = { - "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} - } - - # Second page - pr_response_2 = Mock() - pr_response_2.status_code = 200 - pr_response_2.json.return_value = [{"number": 2, "title": "PR 2", "state": "open"}] - pr_response_2.links = {} - - empty_response = Mock() - empty_response.status_code = 200 - empty_response.json.return_value = [] - - mock_session.get.side_effect = [ - pr_response_1, - empty_response, - empty_response, - empty_response, - pr_response_2, - empty_response, - empty_response, - empty_response, - ] - - with patch.dict( - os.environ, - { - "GITHUB_REPOS": "mozilla/firefox", - "BIGQUERY_PROJECT": "test", - "BIGQUERY_DATASET": "test", - "GITHUB_TOKEN": "token", - }, - clear=True, - ): - main.main() - - # Should be called twice (once per chunk/page) - assert mock_load.call_count == 2 diff --git a/tests/test_main_integration.py b/tests/test_main_integration.py new file mode 100644 index 0000000..e09d940 --- /dev/null +++ b/tests/test_main_integration.py @@ -0,0 +1,544 @@ +#!/usr/bin/env python3 +""" +Tests for main function and full ETL integration. + +Tests main orchestration including environment variables, session setup, +repository processing, chunked ETL flow, and end-to-end integration tests. +""" + +import os +from unittest.mock import MagicMock, Mock, patch + +import pytest + +import main + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_requires_github_repos(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that GITHUB_REPOS is required.""" + with patch.dict( + os.environ, + {"BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test"}, + clear=True, + ): + with pytest.raises(SystemExit) as exc_info: + main.main() + + assert "GITHUB_REPOS" in str(exc_info.value) + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_requires_bigquery_project( + mock_session_class, mock_bq_client, mock_setup_logging +): + """Test that BIGQUERY_PROJECT is required.""" + with patch.dict( + os.environ, + {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"}, + clear=True, + ): + with pytest.raises(SystemExit) as exc_info: + main.main() + + assert "BIGQUERY_PROJECT" in str(exc_info.value) + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_requires_bigquery_dataset( + mock_session_class, mock_bq_client, mock_setup_logging +): + """Test that BIGQUERY_DATASET is required.""" + with patch.dict( + os.environ, + {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"}, + clear=True, + ): + with pytest.raises(SystemExit) as exc_info: + main.main() + + assert "BIGQUERY_DATASET" in str(exc_info.value) + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_github_token_optional_with_warning( + mock_session_class, mock_bq_client, mock_setup_logging +): + """Test that GITHUB_TOKEN is optional but warns if missing.""" + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): + # Should not raise, but should log warning + result = main.main() + assert result == 0 + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_splits_github_repos_by_comma( + mock_session_class, mock_bq_client, mock_setup_logging +): + """Test that GITHUB_REPOS is split by comma.""" + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])) as mock_extract, + ): + main.main() + + # Should be called twice (once per repo) + assert mock_extract.call_count == 2 + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_honors_github_api_url(mock_session_class, mock_bq_client, mock_setup_logging): + """Test that GITHUB_API_URL is honored.""" + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + "GITHUB_API_URL": "https://custom-api.example.com", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])) as mock_extract, + ): + main.main() + + call_kwargs = mock_extract.call_args[1] + assert call_kwargs["github_api_url"] == "https://custom-api.example.com" + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_honors_bigquery_emulator_host( + mock_session_class, mock_bq_client_class, mock_setup_logging +): + """Test that BIGQUERY_EMULATOR_HOST is honored.""" + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + "BIGQUERY_EMULATOR_HOST": "http://localhost:9050", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): + main.main() + + # Verify BigQuery client was created with emulator settings + mock_bq_client_class.assert_called_once() + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_creates_session_with_headers( + mock_session_class, mock_bq_client, mock_setup_logging +): + """Test that session is created with Accept and User-Agent headers.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): + main.main() + + # Verify session headers were set + assert mock_session.headers.update.called + call_args = mock_session.headers.update.call_args[0][0] + assert "Accept" in call_args + assert "User-Agent" in call_args + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_sets_authorization_header_with_token( + mock_session_class, mock_bq_client, mock_setup_logging +): + """Test that Authorization header is set when token provided.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "test-token-123", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): + main.main() + + # Verify Authorization header was set + assert mock_session.headers.__setitem__.called + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +@patch("main.extract_pull_requests") +@patch("main.transform_data") +@patch("main.load_data") +def test_single_repo_successful_etl( + mock_load, + mock_transform, + mock_extract, + mock_session_class, + mock_bq_client, + mock_setup_logging, +): + """Test successful ETL for single repository.""" + mock_extract.return_value = iter([[{"number": 1}]]) + mock_transform.return_value = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + mock_extract.assert_called_once() + mock_transform.assert_called_once() + mock_load.assert_called_once() + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +@patch("main.extract_pull_requests") +@patch("main.transform_data") +@patch("main.load_data") +def test_multiple_repos_processing( + mock_load, + mock_transform, + mock_extract, + mock_session_class, + mock_bq_client, + mock_setup_logging, +): + """Test processing multiple repositories.""" + mock_extract.return_value = iter([[{"number": 1}]]) + mock_transform.return_value = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev,mozilla/addons", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + # Should process 3 repositories + assert mock_extract.call_count == 3 + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +@patch("main.extract_pull_requests") +@patch("main.transform_data") +@patch("main.load_data") +def test_processes_chunks_iteratively( + mock_load, + mock_transform, + mock_extract, + mock_session_class, + mock_bq_client, + mock_setup_logging, +): + """Test that chunks are processed iteratively from generator.""" + # Return 3 chunks + mock_extract.return_value = iter( + [ + [{"number": 1}], + [{"number": 2}], + [{"number": 3}], + ] + ) + mock_transform.return_value = { + "pull_requests": [{"pull_request_id": 1}], + "commits": [], + "reviewers": [], + "comments": [], + } + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + # Transform and load should be called 3 times (once per chunk) + assert mock_transform.call_count == 3 + assert mock_load.call_count == 3 + + +@patch("main.setup_logging") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_returns_zero_on_success( + mock_session_class, mock_bq_client, mock_setup_logging +): + """Test that main returns 0 on success.""" + with ( + patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ), + patch("main.extract_pull_requests", return_value=iter([])), + ): + result = main.main() + + assert result == 0 + + +@pytest.mark.integration +@patch("main.setup_logging") +@patch("main.load_data") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_full_etl_flow_transforms_data_correctly( + mock_session_class, mock_bq_client, mock_load, mock_setup_logging +): + """Test full ETL flow with mocked GitHub responses.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + # Mock PR response + pr_response = Mock() + pr_response.status_code = 200 + pr_response.json.return_value = [ + {"number": 1, "title": "Bug 1234567 - Test PR", "state": "open"} + ] + pr_response.links = {} + + # Mock commits, reviewers, comments responses + empty_response = Mock() + empty_response.status_code = 200 + empty_response.json.return_value = [] + + mock_session.get.side_effect = [ + pr_response, + empty_response, + empty_response, + empty_response, + ] + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + result = main.main() + + assert result == 0 + mock_load.assert_called_once() + + # Verify transformed data structure + call_args = mock_load.call_args[0] + transformed_data = call_args[2] + assert "pull_requests" in transformed_data + assert len(transformed_data["pull_requests"]) == 1 + + +@patch("main.setup_logging") +@patch("main.load_data") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_bug_id_extraction_through_pipeline( + mock_session_class, mock_bq_client, mock_load, mock_setup_logging +): + """Test bug ID extraction through full pipeline.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + pr_response = Mock() + pr_response.status_code = 200 + pr_response.json.return_value = [ + { + "number": 1, + "title": "Bug 9876543 - Fix critical issue", + "state": "closed", + } + ] + pr_response.links = {} + + empty_response = Mock() + empty_response.status_code = 200 + empty_response.json.return_value = [] + + mock_session.get.side_effect = [ + pr_response, + empty_response, + empty_response, + empty_response, + ] + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + main.main() + + call_args = mock_load.call_args[0] + transformed_data = call_args[2] + pr = transformed_data["pull_requests"][0] + assert pr["bug_id"] == 9876543 + + +@patch("main.setup_logging") +@patch("main.load_data") +@patch("main.bigquery.Client") +@patch("requests.Session") +def test_pagination_through_full_flow( + mock_session_class, mock_bq_client, mock_load, mock_setup_logging +): + """Test pagination through full ETL flow.""" + mock_session = MagicMock() + mock_session_class.return_value = mock_session + + # First page + pr_response_1 = Mock() + pr_response_1.status_code = 200 + pr_response_1.json.return_value = [{"number": 1, "title": "PR 1", "state": "open"}] + pr_response_1.links = { + "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"} + } + + # Second page + pr_response_2 = Mock() + pr_response_2.status_code = 200 + pr_response_2.json.return_value = [{"number": 2, "title": "PR 2", "state": "open"}] + pr_response_2.links = {} + + empty_response = Mock() + empty_response.status_code = 200 + empty_response.json.return_value = [] + + mock_session.get.side_effect = [ + pr_response_1, + empty_response, + empty_response, + empty_response, + pr_response_2, + empty_response, + empty_response, + empty_response, + ] + + with patch.dict( + os.environ, + { + "GITHUB_REPOS": "mozilla/firefox", + "BIGQUERY_PROJECT": "test", + "BIGQUERY_DATASET": "test", + "GITHUB_TOKEN": "token", + }, + clear=True, + ): + main.main() + + # Should be called twice (once per chunk/page) + assert mock_load.call_count == 2 diff --git a/tests/test_rate_limit.py b/tests/test_rate_limit.py new file mode 100644 index 0000000..9d32961 --- /dev/null +++ b/tests/test_rate_limit.py @@ -0,0 +1,72 @@ +#!/usr/bin/env python3 +""" +Tests for sleep_for_rate_limit function. + +Tests rate limit handling including wait time calculation and edge cases. +""" + +from unittest.mock import Mock, patch + +import main + + +@patch("time.time") +@patch("time.sleep") +def test_sleep_for_rate_limit_calculates_wait_time(mock_sleep, mock_time): + """Test that sleep_for_rate_limit calculates correct wait time.""" + mock_time.return_value = 1000 + + mock_response = Mock() + mock_response.headers = { + "X-RateLimit-Remaining": "0", + "X-RateLimit-Reset": "1120", # 120 seconds from now + } + + main.sleep_for_rate_limit(mock_response) + + mock_sleep.assert_called_once_with(120) + + +@patch("time.time") +@patch("time.sleep") +def test_sleep_for_rate_limit_when_reset_already_passed(mock_sleep, mock_time): + """Test that sleep_for_rate_limit doesn't sleep negative time.""" + mock_time.return_value = 2000 + + mock_response = Mock() + mock_response.headers = { + "X-RateLimit-Remaining": "0", + "X-RateLimit-Reset": "1500", # Already passed + } + + main.sleep_for_rate_limit(mock_response) + + # Should sleep for 0 seconds (max of 0 and negative value) + mock_sleep.assert_called_once_with(0) + + +@patch("time.sleep") +def test_sleep_for_rate_limit_when_remaining_not_zero(mock_sleep): + """Test that sleep_for_rate_limit doesn't sleep when remaining > 0.""" + mock_response = Mock() + mock_response.headers = { + "X-RateLimit-Remaining": "5", + "X-RateLimit-Reset": "1500", + } + + main.sleep_for_rate_limit(mock_response) + + # Should not sleep when remaining > 0 + mock_sleep.assert_not_called() + + +@patch("time.sleep") +def test_sleep_for_rate_limit_with_missing_headers(mock_sleep): + """Test sleep_for_rate_limit with missing rate limit headers.""" + mock_response = Mock() + mock_response.headers = {} + + main.sleep_for_rate_limit(mock_response) + + # Should not sleep when headers are missing (defaults to remaining=1) + mock_sleep.assert_not_called() diff --git a/tests/test_transform_data.py b/tests/test_transform_data.py new file mode 100644 index 0000000..2b8353b --- /dev/null +++ b/tests/test_transform_data.py @@ -0,0 +1,625 @@ +#!/usr/bin/env python3 +""" +Tests for transform_data function. + +Tests data transformation including bug ID extraction, label processing, +commit/reviewer/comment flattening, and field mapping. +""" + +import main + + +def test_transform_data_basic(): + """Test basic transformation of pull request data.""" + raw_data = [ + { + "number": 123, + "title": "Fix login bug", + "state": "closed", + "created_at": "2024-01-01T10:00:00Z", + "updated_at": "2024-01-02T10:00:00Z", + "merged_at": "2024-01-02T12:00:00Z", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["pull_requests"]) == 1 + pr = result["pull_requests"][0] + assert pr["pull_request_id"] == 123 + assert pr["current_status"] == "closed" + assert pr["date_created"] == "2024-01-01T10:00:00Z" + assert pr["date_modified"] == "2024-01-02T10:00:00Z" + assert pr["date_landed"] == "2024-01-02T12:00:00Z" + assert pr["target_repository"] == "mozilla/firefox" + + +def test_bug_id_extraction_basic(): + """Test bug ID extraction from PR title.""" + test_cases = [ + ("Bug 1234567 - Fix issue", 1234567), + ("bug 1234567: Update code", 1234567), + ("Fix for bug 7654321", 7654321), + ("b=9876543 - Change behavior", 9876543), + ] + + for title, expected_bug_id in test_cases: + raw_data = [ + { + "number": 1, + "title": title, + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] == expected_bug_id + + +def test_bug_id_extraction_with_hash(): + """Test bug ID extraction with # symbol.""" + raw_data = [ + { + "number": 1, + "title": "Bug #1234567 - Fix issue", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] == 1234567 + + +def test_bug_id_filter_large_numbers(): + """Test that bug IDs >= 100000000 are filtered out.""" + raw_data = [ + { + "number": 1, + "title": "Bug 999999999 - Invalid bug ID", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] is None + + +def test_bug_id_no_match(): + """Test PR title with no bug ID.""" + raw_data = [ + { + "number": 1, + "title": "Update documentation", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["bug_id"] is None + + +def test_labels_extraction(): + """Test labels array extraction.""" + raw_data = [ + { + "number": 1, + "title": "PR with labels", + "state": "open", + "labels": [ + {"name": "bug"}, + {"name": "priority-high"}, + {"name": "needs-review"}, + ], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + labels = result["pull_requests"][0]["labels"] + assert len(labels) == 3 + assert "bug" in labels + assert "priority-high" in labels + assert "needs-review" in labels + + +def test_labels_empty_list(): + """Test handling empty labels list.""" + raw_data = [ + { + "number": 1, + "title": "PR without labels", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + assert result["pull_requests"][0]["labels"] == [] + + +def test_commit_transformation(): + """Test commit fields mapping.""" + raw_data = [ + { + "number": 123, + "title": "PR with commits", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "abc123", + "commit": { + "author": { + "name": "Test Author", + "date": "2024-01-01T12:00:00Z", + } + }, + "files": [ + { + "filename": "src/main.py", + "additions": 10, + "deletions": 5, + } + ], + } + ], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["commits"]) == 1 + commit = result["commits"][0] + assert commit["pull_request_id"] == 123 + assert commit["target_repository"] == "mozilla/firefox" + assert commit["commit_sha"] == "abc123" + assert commit["date_created"] == "2024-01-01T12:00:00Z" + assert commit["author_username"] == "Test Author" + assert commit["filename"] == "src/main.py" + assert commit["lines_added"] == 10 + assert commit["lines_removed"] == 5 + + +def test_commit_file_flattening(): + """Test that each file becomes a separate row.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple files", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "abc123", + "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, + "files": [ + {"filename": "file1.py", "additions": 10, "deletions": 5}, + {"filename": "file2.py", "additions": 20, "deletions": 2}, + {"filename": "file3.py", "additions": 5, "deletions": 15}, + ], + } + ], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + # Should have 3 rows in commits table (one per file) + assert len(result["commits"]) == 3 + filenames = [c["filename"] for c in result["commits"]] + assert "file1.py" in filenames + assert "file2.py" in filenames + assert "file3.py" in filenames + + +def test_multiple_commits_with_files(): + """Test multiple commits with multiple files per PR.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple commits", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "commit1", + "commit": {"author": {"name": "Author1", "date": "2024-01-01"}}, + "files": [ + {"filename": "file1.py", "additions": 10, "deletions": 0} + ], + }, + { + "sha": "commit2", + "commit": {"author": {"name": "Author2", "date": "2024-01-02"}}, + "files": [ + {"filename": "file2.py", "additions": 5, "deletions": 2}, + {"filename": "file3.py", "additions": 8, "deletions": 3}, + ], + }, + ], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + # Should have 3 rows total (1 file from commit1, 2 files from commit2) + assert len(result["commits"]) == 3 + assert result["commits"][0]["commit_sha"] == "commit1" + assert result["commits"][1]["commit_sha"] == "commit2" + assert result["commits"][2]["commit_sha"] == "commit2" + + +def test_reviewer_transformation(): + """Test reviewer fields mapping.""" + raw_data = [ + { + "number": 123, + "title": "PR with reviewers", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 789, + "user": {"login": "reviewer1"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T15:00:00Z", + } + ], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["reviewers"]) == 1 + reviewer = result["reviewers"][0] + assert reviewer["pull_request_id"] == 123 + assert reviewer["target_repository"] == "mozilla/firefox" + assert reviewer["reviewer_username"] == "reviewer1" + assert reviewer["status"] == "APPROVED" + assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z" + + +def test_transform_multiple_review_states(): + """Test transforming data with multiple review states.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple reviews", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T15:00:00Z", + }, + { + "id": 2, + "user": {"login": "user2"}, + "state": "CHANGES_REQUESTED", + "submitted_at": "2024-01-01T16:00:00Z", + }, + { + "id": 3, + "user": {"login": "user3"}, + "state": "COMMENTED", + "submitted_at": "2024-01-01T17:00:00Z", + }, + ], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["reviewers"]) == 3 + states = [r["status"] for r in result["reviewers"]] + assert "APPROVED" in states + assert "CHANGES_REQUESTED" in states + assert "COMMENTED" in states + + +def test_date_approved_from_earliest_approval(): + """Test that date_approved is set to earliest APPROVED review.""" + raw_data = [ + { + "number": 123, + "title": "PR with multiple approvals", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "state": "APPROVED", + "submitted_at": "2024-01-02T15:00:00Z", + }, + { + "id": 2, + "user": {"login": "user2"}, + "state": "APPROVED", + "submitted_at": "2024-01-01T14:00:00Z", # Earliest + }, + { + "id": 3, + "user": {"login": "user3"}, + "state": "APPROVED", + "submitted_at": "2024-01-03T16:00:00Z", + }, + ], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + pr = result["pull_requests"][0] + assert pr["date_approved"] == "2024-01-01T14:00:00Z" + + +def test_comment_transformation(): + """Test comment fields mapping.""" + raw_data = [ + { + "number": 123, + "title": "PR with comments", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [ + { + "id": 456, + "user": {"login": "commenter1"}, + "body": "This looks great!", + "created_at": "2024-01-01T14:00:00Z", + "pull_request_review_id": None, + } + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["comments"]) == 1 + comment = result["comments"][0] + assert comment["pull_request_id"] == 123 + assert comment["target_repository"] == "mozilla/firefox" + assert comment["comment_id"] == 456 + assert comment["author_username"] == "commenter1" + assert comment["date_created"] == "2024-01-01T14:00:00Z" + assert comment["character_count"] == 17 + + +def test_comment_character_count(): + """Test character count calculation for comments.""" + raw_data = [ + { + "number": 123, + "title": "PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "body": "Short", + "created_at": "2024-01-01", + }, + { + "id": 2, + "user": {"login": "user2"}, + "body": "This is a much longer comment with more text", + "created_at": "2024-01-01", + }, + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert result["comments"][0]["character_count"] == 5 + assert result["comments"][1]["character_count"] == 44 + + +def test_comment_status_from_review(): + """Test that comment status is mapped from review_id_statuses.""" + raw_data = [ + { + "number": 123, + "title": "PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [ + { + "id": 789, + "user": {"login": "reviewer"}, + "state": "APPROVED", + "submitted_at": "2024-01-01", + } + ], + "comment_data": [ + { + "id": 456, + "user": {"login": "commenter"}, + "body": "LGTM", + "created_at": "2024-01-01", + "pull_request_review_id": 789, + } + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + # Comment should have status from the review + assert result["comments"][0]["status"] == "APPROVED" + + +def test_comment_empty_body(): + """Test handling comments with empty or None body.""" + raw_data = [ + { + "number": 123, + "title": "PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [ + { + "id": 1, + "user": {"login": "user1"}, + "body": None, + "created_at": "2024-01-01", + }, + { + "id": 2, + "user": {"login": "user2"}, + "body": "", + "created_at": "2024-01-01", + }, + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert result["comments"][0]["character_count"] == 0 + assert result["comments"][1]["character_count"] == 0 + + +def test_empty_raw_data(): + """Test handling empty input list.""" + result = main.transform_data([], "mozilla/firefox") + + assert result["pull_requests"] == [] + assert result["commits"] == [] + assert result["reviewers"] == [] + assert result["comments"] == [] + + +def test_pr_without_commits_reviewers_comments(): + """Test PR with no commits, reviewers, or comments.""" + raw_data = [ + { + "number": 123, + "title": "Minimal PR", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert len(result["pull_requests"]) == 1 + assert len(result["commits"]) == 0 + assert len(result["reviewers"]) == 0 + assert len(result["comments"]) == 0 + + +def test_return_structure(): + """Test that transform_data returns dict with 4 keys.""" + raw_data = [ + { + "number": 1, + "title": "Test", + "state": "open", + "labels": [], + "commit_data": [], + "reviewer_data": [], + "comment_data": [], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert isinstance(result, dict) + assert "pull_requests" in result + assert "commits" in result + assert "reviewers" in result + assert "comments" in result + + +def test_all_tables_have_target_repository(): + """Test that all tables include target_repository field.""" + raw_data = [ + { + "number": 123, + "title": "Test PR", + "state": "open", + "labels": [], + "commit_data": [ + { + "sha": "abc", + "commit": {"author": {"name": "Author", "date": "2024-01-01"}}, + "files": [{"filename": "test.py", "additions": 1, "deletions": 0}], + } + ], + "reviewer_data": [ + { + "id": 1, + "user": {"login": "reviewer"}, + "state": "APPROVED", + "submitted_at": "2024-01-01", + } + ], + "comment_data": [ + { + "id": 2, + "user": {"login": "commenter"}, + "body": "Test", + "created_at": "2024-01-01", + } + ], + } + ] + + result = main.transform_data(raw_data, "mozilla/firefox") + + assert result["pull_requests"][0]["target_repository"] == "mozilla/firefox" + assert result["commits"][0]["target_repository"] == "mozilla/firefox" + assert result["reviewers"][0]["target_repository"] == "mozilla/firefox" + assert result["comments"][0]["target_repository"] == "mozilla/firefox" From c4dd862308206ade8cfc980e672c2ec9696e16af Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Fri, 6 Feb 2026 16:43:13 -0500 Subject: [PATCH 10/11] Separate TESTING.md not necessary. Added testing section to README.md --- README.md | 91 ++++++++ TESTING.md | 621 ----------------------------------------------------- 2 files changed, 91 insertions(+), 621 deletions(-) delete mode 100644 TESTING.md diff --git a/README.md b/README.md index 570bacb..ae10820 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,97 @@ This setup includes: - **BigQuery Emulator**: Local BigQuery instance for testing - **ETL Service**: Configured to use both mock services +### Running Tests + +The project includes a comprehensive test suite using pytest. Tests are organized in the `test/` directory and include both unit and integration tests. + +#### Setting Up the Development Environment + +1. **Install Python 3.14** (or your compatible Python version) + +2. **Install development dependencies**: + + ```bash + # Install the package with dev dependencies + pip install -e ".[dev]" + ``` + + This installs: + - `pytest` - Testing framework + - `pytest-mock` - Mocking utilities for tests + - `ruff` - Linter + - `black` - Code formatter + +3. **Verify installation**: + + ```bash + pytest --version + ``` + +#### Running the Tests + +Run all tests: + +```bash +pytest +``` + +Run tests with verbose output: + +```bash +pytest -v +``` + +Run specific test files: + +```bash +pytest test/test_extract_pull_requests.py +pytest test/test_transform_data.py +``` + +Run tests by marker: + +```bash +# Run only unit tests +pytest -m unit + +# Run only integration tests +pytest -m integration + +# Skip slow tests +pytest -m "not slow" +``` + +Run tests with coverage reporting: + +```bash +pytest --cov=. --cov-report=html +``` + +#### Test Organization + +The test suite is organized into the following files: + +- `test/conftest.py` - Shared pytest fixtures and test configuration +- `test/test_extract_pull_requests.py` - Tests for PR extraction logic +- `test/test_extract_commits.py` - Tests for commit extraction +- `test/test_extract_comments.py` - Tests for comment extraction +- `test/test_extract_reviewers.py` - Tests for reviewer extraction +- `test/test_transform_data.py` - Tests for data transformation +- `test/test_load_data.py` - Tests for BigQuery loading +- `test/test_rate_limit.py` - Tests for rate limit handling +- `test/test_main_integration.py` - End-to-end integration tests +- `test/test_logging.py` - Tests for logging setup +- `test/test_formatting.py` - Code formatting tests + +#### Test Markers + +Tests are marked with the following pytest markers: + +- `@pytest.mark.unit` - Unit tests for individual functions +- `@pytest.mark.integration` - Integration tests across multiple components +- `@pytest.mark.slow` - Tests that take longer to run + ### Adding Dependencies Add new Python packages to `requirements.txt` and rebuild the Docker image. diff --git a/TESTING.md b/TESTING.md deleted file mode 100644 index 6901d2f..0000000 --- a/TESTING.md +++ /dev/null @@ -1,621 +0,0 @@ -# Testing Guide for GitHub ETL - -This document describes comprehensive testing for the GitHub ETL pipeline, including -unit tests, integration tests, Docker testing, linting, and CI/CD workflows. - -## Table of Contents - -1. [Unit Testing](#unit-testing) -2. [Test Organization](#test-organization) -3. [Running Tests](#running-tests) -4. [Code Coverage](#code-coverage) -5. [Linting and Code Quality](#linting-and-code-quality) -6. [CI/CD Integration](#cicd-integration) -7. [Docker Testing](#docker-testing) -8. [Adding New Tests](#adding-new-tests) - ---- - -## Unit Testing - -The test suite in `test_main.py` provides comprehensive coverage for all functions in `main.py`. -We have unit tests covering 9 functions with 80%+ code coverage requirement. - -### Test Structure - -Tests are organized into 10 test classes: - -1. **TestSetupLogging** - Logging configuration -2. **TestSleepForRateLimit** - Rate limit handling -3. **TestExtractPullRequests** - PR extraction with pagination and enrichment -4. **TestExtractCommits** - Commit and file extraction -5. **TestExtractReviewers** - Reviewer extraction -6. **TestExtractComments** - Comment extraction (uses /issues endpoint) -7. **TestTransformData** - Data transformation for all 4 BigQuery tables -8. **TestLoadData** - BigQuery data loading -9. **TestMain** - Main ETL orchestration -10. **TestIntegration** - End-to-end integration tests (marked with `@pytest.mark.integration`) - -### Fixtures - -Reusable fixtures are defined at the top of `test_main.py`: - -- `mock_session` - Mocked `requests.Session` -- `mock_bigquery_client` - Mocked BigQuery client -- `mock_pr_response` - Realistic pull request response -- `mock_commit_response` - Realistic commit with files -- `mock_reviewer_response` - Realistic reviewer response -- `mock_comment_response` - Realistic comment response - -## Test Organization - -### Function Coverage - -| Function | Coverage Target | Key Test Areas | -|----------|------------------|----------------| -| `setup_logging()` | 100% | Logger configuration | -| `sleep_for_rate_limit()` | 100% | Rate limit sleep logic, edge cases | -| `extract_pull_requests()` | 90%+ | Pagination, rate limits, enrichment, error handling | -| `extract_commits()` | 85%+ | Commit/file fetching, rate limits, errors | -| `extract_reviewers()` | 85%+ | Reviewer states, rate limits, errors | -| `extract_comments()` | 85%+ | Comment fetching (via /issues), rate limits | -| `transform_data()` | 95%+ | Bug ID extraction, 4 tables, field mapping | -| `load_data()` | 90%+ | BigQuery insertion, snapshot dates, errors | -| `main()` | 85%+ | Env vars, orchestration, chunking | - -**Overall Target: 85-90% coverage** (80% minimum enforced in CI) - -### Critical Test Cases - -#### Bug ID Extraction -Tests verify the regex pattern matches: -- `Bug 1234567 - Fix` → 1234567 -- `bug 1234567` → 1234567 -- `b=1234567` → 1234567 -- `Bug #1234567` → 1234567 -- Filters out IDs >= 100000000 - -#### Data Transformation -Tests ensure correct transformation for all 4 BigQuery tables: -- **pull_requests**: PR metadata, bug IDs, labels, date_approved -- **commits**: Flattened files (one row per file), commit metadata -- **reviewers**: Review states, date_approved calculation -- **comments**: Character count, status mapping from reviews - -#### Rate Limiting -Tests verify rate limit handling at all API levels: -- Pull requests pagination -- Commit fetching -- Reviewer fetching -- Comment fetching - -## Running Tests - -### All Tests with Coverage - -```bash -pytest -``` - -This runs all tests with coverage reporting (configured in `pytest.ini`). - -### Fast Unit Tests Only (Skip Integration) - -```bash -pytest -m "not integration and not slow" -``` - -Use this for fast feedback during development. - -### Specific Test Class - -```bash -pytest test_main.py::TestTransformData -``` - -### Specific Test Function - -```bash -pytest test_main.py::TestTransformData::test_bug_id_extraction_basic -v -``` - -### With Verbose Output - -```bash -pytest -v -``` - -### With Coverage Report - -```bash -# Terminal report -pytest --cov=main --cov-report=term-missing - -# HTML report -pytest --cov=main --cov-report=html -open htmlcov/index.html -``` - -### Integration Tests Only - -```bash -pytest -m integration -``` - -## Code Coverage - -### Coverage Requirements - -- **Minimum**: 80% (enforced in CI via `--cov-fail-under=80`) -- **Target**: 85-90% -- **Current**: Run `pytest --cov=main` to see current coverage - -### Coverage Configuration - -Coverage settings are in `pytest.ini`: - -```ini -[pytest] -addopts = - --cov=main - --cov-report=term-missing - --cov-report=html - --cov-branch - --cov-fail-under=80 -``` - -### Viewing Coverage - -```bash -# Generate HTML coverage report -pytest --cov=main --cov-report=html - -# Open in browser -xdg-open htmlcov/index.html # Linux -open htmlcov/index.html # macOS -``` - -The HTML report shows: -- Line-by-line coverage -- Branch coverage -- Missing lines highlighted -- Per-file coverage percentages - -## Linting and Code Quality - -### Available Linters - -The project uses these linting tools (defined in `requirements.txt`): - -- **black** - Code formatting -- **isort** - Import sorting -- **flake8** - Style and syntax checking -- **mypy** - Static type checking - -### Running Linters - -```bash -# Run black (auto-format) -black main.py test_main.py - -# Check formatting without changes -black --check main.py test_main.py - -# Sort imports -isort main.py test_main.py - -# Check import sorting -isort --check-only main.py test_main.py - -# Run flake8 -flake8 main.py test_main.py --max-line-length=100 --extend-ignore=E203,W503 - -# Run mypy -mypy main.py --no-strict-optional --ignore-missing-imports -``` - -### All Linting Checks - -```bash -# Run all linters in sequence -black --check main.py test_main.py && \ -isort --check-only main.py test_main.py && \ -flake8 main.py test_main.py --max-line-length=100 --extend-ignore=E203,W503 && \ -mypy main.py --no-strict-optional --ignore-missing-imports -``` - -## CI/CD Integration - -### GitHub Actions Workflow - -The `.github/workflows/tests.yml` workflow runs on every pull request: - -**Lint Job:** -1. Runs black (format check) -2. Runs isort (import check) -3. Runs flake8 (style check) -4. Runs mypy (type check) - -**Test Job:** -1. Runs fast unit tests with 80% coverage threshold -2. Runs all tests (including integration) -3. Uploads coverage reports as artifacts - -### Workflow Triggers - -- Pull requests to `main` branch - -### Viewing Results - -- Check the Actions tab in GitHub -- Coverage artifacts are uploaded for each run -- Failed linting or tests will block merges - -## Docker Testing - -## Overview - -The `docker-compose.yml` configuration provides a complete local testing environment with: - -1. **Mock GitHub API** - A Flask-based mock service that simulates the GitHub Pull Requests API -2. **BigQuery Emulator** - A local BigQuery instance for testing data loads -3. **ETL Service** - The main GitHub ETL application configured to use the mock services - -## Quick Start - -### Start all services - -```bash -docker-compose up --build -``` - -This will: - -- Build and start the mock GitHub API (port 5000) -- Start the BigQuery emulator (ports 9050, 9060) -- Build and run the ETL service - -The ETL service will automatically: - -- Fetch 250 mock pull requests from the mock GitHub API -- Transform the data -- Load it into the BigQuery emulator - -### View logs - -```bash -# All services -docker-compose logs -f - -# Specific service -docker-compose logs -f github-etl -docker-compose logs -f bigquery-emulator -docker-compose logs -f mock-github-api -``` - -### Stop services - -```bash -docker-compose down -``` - -## Architecture - -### Mock GitHub API Service - -- **Port**: 5000 -- **Endpoint**: `http://localhost:5000/repos/{owner}/{repo}/pulls` -- **Mock data**: Generates 250 sample pull requests with realistic data -- **Features**: - - Pagination support (per_page, page parameters) - - Realistic PR data (numbers, titles, states, timestamps, users, etc.) - - Mock rate limit headers - - No authentication required - -### BigQuery Emulator Service - -- **Ports**: - - 9050 (BigQuery API) - - 9060 (Discovery/Admin API) -- **Configuration**: Uses `data.yml` to define the schema -- **Project**: test -- **Dataset**: github_etl -- **Table**: pull_requests - -### ETL Service - -The ETL service is configured via environment variables in `docker-compose.yml`: - -```yaml -environment: - GITHUB_REPOS: "mozilla-firefox/firefox" - GITHUB_TOKEN: "" # Not needed for mock API - GITHUB_API_URL: "http://mock-github-api:5000" - BIGQUERY_PROJECT: "test" - BIGQUERY_DATASET: "github_etl" - BIGQUERY_EMULATOR_HOST: "http://bigquery-emulator:9050" -``` - -## Customization - -### Using Real GitHub API - -To test with the real GitHub API instead of the mock: - -1. Set `GITHUB_TOKEN` environment variable -2. Remove or comment out `GITHUB_API_URL` in docker-compose.yml -3. Update `depends_on` to not require mock-github-api - -```bash -export GITHUB_TOKEN="your_github_token" -docker-compose up github-etl bigquery-emulator -``` - -### Adjusting Mock Data - -Edit `mock_github_api.py` to customize: - -- Total number of PRs (default: 250) -- PR field values -- Pagination behavior - -### Modifying BigQuery Schema - -Edit `data.yml` to change the table schema. The schema matches the fields -extracted in `main.py`'s `transform_data()` function. - -## Querying the BigQuery Emulator - -You can query the BigQuery emulator using the BigQuery Python client: - -```python -from google.cloud import bigquery -from google.api_core.client_options import ClientOptions - -client = bigquery.Client( - project="test-project", - client_options=ClientOptions(api_endpoint="http://localhost:9050") -) - -query = """ -SELECT pr_number, title, state, user_login -FROM `test-project.test_dataset.pull_requests` -LIMIT 10 -""" - -for row in client.query(query): - print(f"PR #{row.pr_number}: {row.title} - {row.state}") -``` - -Or use the `bq` command-line tool with the emulator endpoint. - -## Troubleshooting - -### Services not starting - -Check if ports are already in use: - -```bash -lsof -i :5000 # Mock GitHub API -lsof -i :9050 # BigQuery emulator -``` - -### ETL fails to connect - -Ensure services are healthy: - -```bash -docker-compose ps -``` - -Check service logs: - -```bash -docker-compose logs bigquery-emulator -docker-compose logs mock-github-api -``` - -### Schema mismatch errors - -Verify `data.yml` schema matches fields in `main.py:transform_data()`. - -## Development Workflow - -1. Make changes to `main.py` -2. Restart the ETL service: `docker-compose restart github-etl` -3. View logs: `docker-compose logs -f github-etl` - -The `main.py` file is mounted as a volume, so changes are reflected without rebuilding. - -## Cleanup - -Remove all containers and volumes: - -```bash -docker-compose down -v -``` - -Remove built images: - -```bash -docker-compose down --rmi all -``` - ---- - -## Adding New Tests - -### Testing Patterns - -#### 1. Mock External Dependencies - -Always mock external API calls and BigQuery operations: - -```python -@patch("requests.Session") -def test_api_call(mock_session_class): - mock_session = MagicMock() - mock_session_class.return_value = mock_session - - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = [{"id": 1}] - - mock_session.get.return_value = mock_response - # Test code here -``` - -#### 2. Use Fixtures - -Leverage existing fixtures for common test data: - -```python -def test_with_fixtures(mock_session, mock_pr_response): - # Use mock_session and mock_pr_response - pass -``` - -#### 3. Test Edge Cases - -Always test: -- Empty inputs -- None values -- Missing fields -- Rate limits -- API errors (404, 500, etc.) -- Boundary conditions - -#### 4. Verify Call Arguments - -Check that functions are called with correct parameters: - -```python -mock_extract.assert_called_once_with( - session=mock_session, - repo="mozilla/firefox", - github_api_url="https://api.github.com" -) -``` - -### Example: Adding a New Test - -```python -class TestNewFunction: - """Tests for new_function.""" - - def test_basic_functionality(self, mock_session): - """Test basic happy path.""" - # Arrange - mock_response = Mock() - mock_response.status_code = 200 - mock_response.json.return_value = {"result": "success"} - mock_session.get.return_value = mock_response - - # Act - result = main.new_function(mock_session, "arg1") - - # Assert - assert result == {"result": "success"} - mock_session.get.assert_called_once() - - def test_error_handling(self, mock_session): - """Test error handling.""" - mock_response = Mock() - mock_response.status_code = 500 - mock_response.text = "Internal Error" - mock_session.get.return_value = mock_response - - with pytest.raises(SystemExit) as exc_info: - main.new_function(mock_session, "arg1") - - assert "500" in str(exc_info.value) -``` - -### Test Organization Guidelines - -1. **Group related tests** in test classes -2. **Use descriptive names** like `test_handles_rate_limit_on_commits` -3. **One assertion concept per test** - Test one thing at a time -4. **Arrange-Act-Assert pattern** - Structure tests clearly -5. **Add docstrings** to explain what each test verifies - -### Mocking Patterns - -#### Mocking Time - -```python -@patch("time.time") -@patch("time.sleep") -def test_with_time(mock_sleep, mock_time): - mock_time.return_value = 1000 - # Test code -``` - -#### Mocking Environment Variables - -```python -with patch.dict(os.environ, {"VAR_NAME": "value"}, clear=True): - # Test code -``` - -#### Mocking Generators - -```python -mock_extract.return_value = iter([[{"id": 1}], [{"id": 2}]]) -``` - -### Running Tests During Development - -```bash -# Auto-run tests on file changes (requires pytest-watch) -pip install pytest-watch -ptw -- --cov=main -m "not integration" -``` - -### Debugging Tests - -```bash -# Drop into debugger on failures -pytest --pdb - -# Show print statements -pytest -s - -# Verbose with full diff -pytest -vv -``` - -### Coverage Tips - -If coverage is below 80%: - -1. Run `pytest --cov=main --cov-report=term-missing` to see missing lines -2. Look for untested branches (if/else paths) -3. Check error handling paths -4. Verify edge cases are covered - -## Resources - -- [pytest documentation](https://docs.pytest.org/) -- [pytest-cov documentation](https://pytest-cov.readthedocs.io/) -- [unittest.mock documentation](https://docs.python.org/3/library/unittest.mock.html) - -## Troubleshooting - -### Tests Pass Locally But Fail in CI - -- Check Python version (must be 3.14) -- Verify all dependencies are in `requirements.txt` -- Look for environment-specific issues - -### Coverage Dropped Below 80% - -- Run locally: `pytest --cov=main --cov-report=html` -- Open `htmlcov/index.html` to see uncovered lines -- Add tests for missing coverage - -### Import Errors - -- Ensure `PYTHONPATH` includes project root -- Check that `__init__.py` files exist if needed -- Verify module names match file names From 48c1c46e1a2b139752ab97d6d5dd8157dcb13c6c Mon Sep 17 00:00:00 2001 From: David Lawrence Date: Fri, 6 Feb 2026 17:32:26 -0500 Subject: [PATCH 11/11] Copoilot suggested fixes --- .github/workflows/tests.yml | 13 ++++++------- README.md | 26 +++++++++++++------------- test_formatting.py | 16 ---------------- 3 files changed, 19 insertions(+), 36 deletions(-) delete mode 100644 test_formatting.py diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml index b4cc85b..4025084 100644 --- a/.github/workflows/tests.yml +++ b/.github/workflows/tests.yml @@ -23,10 +23,9 @@ jobs: integration-test: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 - - name: Run integration test with docker compose - run: | - docker compose up --build --abort-on-container-exit --exit-code-from github-etl - - name: Cleanup - if: always() - run: docker compose down -v + - uses: actions/checkout@v4 + - name: Run integration test with docker compose + run: | + docker compose up --build --abort-on-container-exit --exit-code-from github-etl + - name: Cleanup + run: docker compose down -v diff --git a/README.md b/README.md index ae10820..d27188b 100644 --- a/README.md +++ b/README.md @@ -201,8 +201,8 @@ pytest -v Run specific test files: ```bash -pytest test/test_extract_pull_requests.py -pytest test/test_transform_data.py +pytest tests/test_extract_pull_requests.py +pytest tests/test_transform_data.py ``` Run tests by marker: @@ -228,17 +228,17 @@ pytest --cov=. --cov-report=html The test suite is organized into the following files: -- `test/conftest.py` - Shared pytest fixtures and test configuration -- `test/test_extract_pull_requests.py` - Tests for PR extraction logic -- `test/test_extract_commits.py` - Tests for commit extraction -- `test/test_extract_comments.py` - Tests for comment extraction -- `test/test_extract_reviewers.py` - Tests for reviewer extraction -- `test/test_transform_data.py` - Tests for data transformation -- `test/test_load_data.py` - Tests for BigQuery loading -- `test/test_rate_limit.py` - Tests for rate limit handling -- `test/test_main_integration.py` - End-to-end integration tests -- `test/test_logging.py` - Tests for logging setup -- `test/test_formatting.py` - Code formatting tests +- `tests/conftest.py` - Shared pytest fixtures and test configuration +- `tests/test_extract_pull_requests.py` - Tests for PR extraction logic +- `tests/test_extract_commits.py` - Tests for commit extraction +- `tests/test_extract_comments.py` - Tests for comment extraction +- `tests/test_extract_reviewers.py` - Tests for reviewer extraction +- `tests/test_transform_data.py` - Tests for data transformation +- `tests/test_load_data.py` - Tests for BigQuery loading +- `tests/test_rate_limit.py` - Tests for rate limit handling +- `tests/test_main_integration.py` - End-to-end integration tests +- `tests/test_logging.py` - Tests for logging setup +- `tests/test_formatting.py` - Code formatting tests #### Test Markers diff --git a/test_formatting.py b/test_formatting.py deleted file mode 100644 index c92e534..0000000 --- a/test_formatting.py +++ /dev/null @@ -1,16 +0,0 @@ -""" -Code Style Tests. -""" - -import subprocess - - -def test_black(): - cmd = ("black", "--diff", "main.py") - output = subprocess.check_output(cmd) - assert not output, "The python code does not adhere to the project style." - - -def test_ruff(): - passed = subprocess.call(("ruff", "check", "main.py", "--target-version", "py314")) - assert not passed, "ruff did not run cleanly."