From 1791608d0a565561c69fb2e891bb051af362ada3 Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Wed, 21 Jan 2026 16:57:08 -0500
Subject: [PATCH 01/11] feat: Added comprehensive unit testing and github
 action to run tests on new pull requests

---
 .github/workflows/tests.yml |   11 +
 TESTING.md                  |  621 +++++++++++
 pytest.ini                  |   49 +
 test_main.py                | 2106 +++++++++++++++++++++++++++++++++++
 4 files changed, 2787 insertions(+)
 create mode 100644 TESTING.md
 create mode 100644 pytest.ini
 create mode 100644 test_main.py

diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
index 8118a66..c7b9d39 100644
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -19,3 +19,14 @@ jobs:
           pip install -e ".[dev]"
       - name: Run all tests
         run: pytest
+
+  integration-test:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v4
+    - name: Run integration test with docker-compose
+      run: |
+        docker-compose up --build --abort-on-container-exit --exit-code-from github-etl
+    - name: Cleanup
+      if: always()
+      run: docker-compose down -v
diff --git a/TESTING.md b/TESTING.md
new file mode 100644
index 0000000..c0bb5dd
--- /dev/null
+++ b/TESTING.md
@@ -0,0 +1,621 @@
+# Testing Guide for GitHub ETL
+
+This document describes comprehensive testing for the GitHub ETL pipeline, including
+unit tests, integration tests, Docker testing, linting, and CI/CD workflows.
+
+## Table of Contents
+
+1. [Unit Testing](#unit-testing)
+2. [Test Organization](#test-organization)
+3. [Running Tests](#running-tests)
+4. [Code Coverage](#code-coverage)
+5. [Linting and Code Quality](#linting-and-code-quality)
+6. [CI/CD Integration](#cicd-integration)
+7. [Docker Testing](#docker-testing)
+8. [Adding New Tests](#adding-new-tests)
+
+---
+
+## Unit Testing
+
+The test suite in `test_main.py` provides comprehensive coverage for all functions in `main.py`.
+We have **95 unit tests** covering 9 functions with 80%+ code coverage requirement.
+
+### Test Structure
+
+Tests are organized into 10 test classes:
+
+1. **TestSetupLogging** (1 test) - Logging configuration
+2. **TestSleepForRateLimit** (4 tests) - Rate limit handling
+3. **TestExtractPullRequests** (14 tests) - PR extraction with pagination and enrichment
+4. **TestExtractCommits** (9 tests) - Commit and file extraction
+5. **TestExtractReviewers** (6 tests) - Reviewer extraction
+6. **TestExtractComments** (7 tests) - Comment extraction (uses /issues endpoint)
+7. **TestTransformData** (26 tests) - Data transformation for all 4 BigQuery tables
+8. **TestLoadData** (8 tests) - BigQuery data loading
+9. **TestMain** (17 tests) - Main ETL orchestration
+10. **TestIntegration** (3 tests) - End-to-end integration tests (marked with `@pytest.mark.integration`)
+
+### Fixtures
+
+Reusable fixtures are defined at the top of `test_main.py`:
+
+- `mock_session` - Mocked `requests.Session`
+- `mock_bigquery_client` - Mocked BigQuery client
+- `mock_pr_response` - Realistic pull request response
+- `mock_commit_response` - Realistic commit with files
+- `mock_reviewer_response` - Realistic reviewer response
+- `mock_comment_response` - Realistic comment response
+
+## Test Organization
+
+### Function Coverage
+
+| Function | Tests | Coverage Target | Key Test Areas |
+|----------|-------|-----------------|----------------|
+| `setup_logging()` | 1 | 100% | Logger configuration |
+| `sleep_for_rate_limit()` | 4 | 100% | Rate limit sleep logic, edge cases |
+| `extract_pull_requests()` | 14 | 90%+ | Pagination, rate limits, enrichment, error handling |
+| `extract_commits()` | 9 | 85%+ | Commit/file fetching, rate limits, errors |
+| `extract_reviewers()` | 6 | 85%+ | Reviewer states, rate limits, errors |
+| `extract_comments()` | 7 | 85%+ | Comment fetching (via /issues), rate limits |
+| `transform_data()` | 26 | 95%+ | Bug ID extraction, 4 tables, field mapping |
+| `load_data()` | 8 | 90%+ | BigQuery insertion, snapshot dates, errors |
+| `main()` | 17 | 85%+ | Env vars, orchestration, chunking |
+
+**Overall Target: 85-90% coverage** (80% minimum enforced in CI)
+
+### Critical Test Cases
+
+#### Bug ID Extraction
+Tests verify the regex pattern matches:
+- `Bug 1234567 - Fix` → 1234567
+- `bug 1234567` → 1234567
+- `b=1234567` → 1234567
+- `Bug #1234567` → 1234567
+- Filters out IDs >= 100000000
+
+#### Data Transformation
+Tests ensure correct transformation for all 4 BigQuery tables:
+- **pull_requests**: PR metadata, bug IDs, labels, date_approved
+- **commits**: Flattened files (one row per file), commit metadata
+- **reviewers**: Review states, date_approved calculation
+- **comments**: Character count, status mapping from reviews
+
+#### Rate Limiting
+Tests verify rate limit handling at all API levels:
+- Pull requests pagination
+- Commit fetching
+- Reviewer fetching
+- Comment fetching
+
+## Running Tests
+
+### All Tests with Coverage
+
+```bash
+pytest
+```
+
+This runs all tests with coverage reporting (configured in `pytest.ini`).
+
+### Fast Unit Tests Only (Skip Integration)
+
+```bash
+pytest -m "not integration and not slow"
+```
+
+Use this for fast feedback during development.
+
+### Specific Test Class
+
+```bash
+pytest test_main.py::TestTransformData
+```
+
+### Specific Test Function
+
+```bash
+pytest test_main.py::TestTransformData::test_bug_id_extraction_basic -v
+```
+
+### With Verbose Output
+
+```bash
+pytest -v
+```
+
+### With Coverage Report
+
+```bash
+# Terminal report
+pytest --cov=main --cov-report=term-missing
+
+# HTML report
+pytest --cov=main --cov-report=html
+open htmlcov/index.html
+```
+
+### Integration Tests Only
+
+```bash
+pytest -m integration
+```
+
+## Code Coverage
+
+### Coverage Requirements
+
+- **Minimum**: 80% (enforced in CI via `--cov-fail-under=80`)
+- **Target**: 85-90%
+- **Current**: Run `pytest --cov=main` to see current coverage
+
+### Coverage Configuration
+
+Coverage settings are in `pytest.ini`:
+
+```ini
+[pytest]
+addopts =
+    --cov=main
+    --cov-report=term-missing
+    --cov-report=html
+    --cov-branch
+    --cov-fail-under=80
+```
+
+### Viewing Coverage
+
+```bash
+# Generate HTML coverage report
+pytest --cov=main --cov-report=html
+
+# Open in browser
+xdg-open htmlcov/index.html  # Linux
+open htmlcov/index.html      # macOS
+```
+
+The HTML report shows:
+- Line-by-line coverage
+- Branch coverage
+- Missing lines highlighted
+- Per-file coverage percentages
+
+## Linting and Code Quality
+
+### Available Linters
+
+The project uses these linting tools (defined in `requirements.txt`):
+
+- **black** - Code formatting
+- **isort** - Import sorting
+- **flake8** - Style and syntax checking
+- **mypy** - Static type checking
+
+### Running Linters
+
+```bash
+# Run black (auto-format)
+black main.py test_main.py
+
+# Check formatting without changes
+black --check main.py test_main.py
+
+# Sort imports
+isort main.py test_main.py
+
+# Check import sorting
+isort --check-only main.py test_main.py
+
+# Run flake8
+flake8 main.py test_main.py --max-line-length=100 --extend-ignore=E203,W503
+
+# Run mypy
+mypy main.py --no-strict-optional --ignore-missing-imports
+```
+
+### All Linting Checks
+
+```bash
+# Run all linters in sequence
+black --check main.py test_main.py && \
+isort --check-only main.py test_main.py && \
+flake8 main.py test_main.py --max-line-length=100 --extend-ignore=E203,W503 && \
+mypy main.py --no-strict-optional --ignore-missing-imports
+```
+
+## CI/CD Integration
+
+### GitHub Actions Workflow
+
+The `.github/workflows/tests.yml` workflow runs on every push and pull request:
+
+**Lint Job:**
+1. Runs black (format check)
+2. Runs isort (import check)
+3. Runs flake8 (style check)
+4. Runs mypy (type check)
+
+**Test Job:**
+1. Runs fast unit tests with 80% coverage threshold
+2. Runs all tests (including integration)
+3. Uploads coverage reports as artifacts
+
+### Workflow Triggers
+
+- Push to `main` or `unit-tests` branch
+- Pull requests to `main` branch
+
+### Viewing Results
+
+- Check the Actions tab in GitHub
+- Coverage artifacts are uploaded for each run
+- Failed linting or tests will block merges
+
+## Docker Testing
+
+## Overview
+
+The `docker-compose.yml` configuration provides a complete local testing environment with:
+
+1. **Mock GitHub API** - A Flask-based mock service that simulates the GitHub Pull Requests API
+2. **BigQuery Emulator** - A local BigQuery instance for testing data loads
+3. **ETL Service** - The main GitHub ETL application configured to use the mock services
+
+## Quick Start
+
+### Start all services
+
+```bash
+docker-compose up --build
+```
+
+This will:
+
+- Build and start the mock GitHub API (port 5000)
+- Start the BigQuery emulator (ports 9050, 9060)
+- Build and run the ETL service
+
+The ETL service will automatically:
+
+- Fetch 250 mock pull requests from the mock GitHub API
+- Transform the data
+- Load it into the BigQuery emulator
+
+### View logs
+
+```bash
+# All services
+docker-compose logs -f
+
+# Specific service
+docker-compose logs -f github-etl
+docker-compose logs -f bigquery-emulator
+docker-compose logs -f mock-github-api
+```
+
+### Stop services
+
+```bash
+docker-compose down
+```
+
+## Architecture
+
+### Mock GitHub API Service
+
+- **Port**: 5000
+- **Endpoint**: `http://localhost:5000/repos/{owner}/{repo}/pulls`
+- **Mock data**: Generates 250 sample pull requests with realistic data
+- **Features**:
+  - Pagination support (per_page, page parameters)
+  - Realistic PR data (numbers, titles, states, timestamps, users, etc.)
+  - Mock rate limit headers
+  - No authentication required
+
+### BigQuery Emulator Service
+
+- **Ports**:
+  - 9050 (BigQuery API)
+  - 9060 (Discovery/Admin API)
+- **Configuration**: Uses `data.yml` to define the schema
+- **Project**: test-project
+- **Dataset**: test_dataset
+- **Table**: pull_requests
+
+### ETL Service
+
+The ETL service is configured via environment variables in `docker-compose.yml`:
+
+```yaml
+environment:
+  GITHUB_REPOS: "mozilla/firefox"
+  GITHUB_API_URL: "http://mock-github-api:5000"  # Points to mock API
+  BIGQUERY_PROJECT: "test"
+  BIGQUERY_DATASET: "github_etl"
+  BIGQUERY_EMULATOR_HOST: "http://bigquery-emulator:9050"
+```
+
+## Customization
+
+### Using Real GitHub API
+
+To test with the real GitHub API instead of the mock:
+
+1. Set `GITHUB_TOKEN` environment variable
+2. Remove or comment out `GITHUB_API_URL` in docker-compose.yml
+3. Update `depends_on` to not require mock-github-api
+
+```bash
+export GITHUB_TOKEN="your_github_token"
+docker-compose up github-etl bigquery-emulator
+```
+
+### Adjusting Mock Data
+
+Edit `mock_github_api.py` to customize:
+
+- Total number of PRs (default: 250)
+- PR field values
+- Pagination behavior
+
+### Modifying BigQuery Schema
+
+Edit `data.yml` to change the table schema. The schema matches the fields
+extracted in `main.py`'s `transform_data()` function.
+
+## Querying the BigQuery Emulator
+
+You can query the BigQuery emulator using the BigQuery Python client:
+
+```python
+from google.cloud import bigquery
+from google.api_core.client_options import ClientOptions
+
+client = bigquery.Client(
+    project="test-project",
+    client_options=ClientOptions(api_endpoint="http://localhost:9050")
+)
+
+query = """
+SELECT pr_number, title, state, user_login
+FROM `test-project.test_dataset.pull_requests`
+LIMIT 10
+"""
+
+for row in client.query(query):
+    print(f"PR #{row.pr_number}: {row.title} - {row.state}")
+```
+
+Or use the `bq` command-line tool with the emulator endpoint.
+
+## Troubleshooting
+
+### Services not starting
+
+Check if ports are already in use:
+
+```bash
+lsof -i :5000  # Mock GitHub API
+lsof -i :9050  # BigQuery emulator
+```
+
+### ETL fails to connect
+
+Ensure services are healthy:
+
+```bash
+docker-compose ps
+```
+
+Check service logs:
+
+```bash
+docker-compose logs bigquery-emulator
+docker-compose logs mock-github-api
+```
+
+### Schema mismatch errors
+
+Verify `data.yml` schema matches fields in `main.py:transform_data()`.
+
+## Development Workflow
+
+1. Make changes to `main.py`
+2. Restart the ETL service: `docker-compose restart github-etl`
+3. View logs: `docker-compose logs -f github-etl`
+
+The `main.py` file is mounted as a volume, so changes are reflected without rebuilding.
+
+## Cleanup
+
+Remove all containers and volumes:
+
+```bash
+docker-compose down -v
+```
+
+Remove built images:
+
+```bash
+docker-compose down --rmi all
+```
+
+---
+
+## Adding New Tests
+
+### Testing Patterns
+
+#### 1. Mock External Dependencies
+
+Always mock external API calls and BigQuery operations:
+
+```python
+@patch("requests.Session")
+def test_api_call(mock_session_class):
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
+
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [{"id": 1}]
+
+    mock_session.get.return_value = mock_response
+    # Test code here
+```
+
+#### 2. Use Fixtures
+
+Leverage existing fixtures for common test data:
+
+```python
+def test_with_fixtures(mock_session, mock_pr_response):
+    # Use mock_session and mock_pr_response
+    pass
+```
+
+#### 3. Test Edge Cases
+
+Always test:
+- Empty inputs
+- None values
+- Missing fields
+- Rate limits
+- API errors (404, 500, etc.)
+- Boundary conditions
+
+#### 4. Verify Call Arguments
+
+Check that functions are called with correct parameters:
+
+```python
+mock_extract.assert_called_once_with(
+    session=mock_session,
+    repo="mozilla/firefox",
+    github_api_url="https://api.github.com"
+)
+```
+
+### Example: Adding a New Test
+
+```python
+class TestNewFunction:
+    """Tests for new_function."""
+
+    def test_basic_functionality(self, mock_session):
+        """Test basic happy path."""
+        # Arrange
+        mock_response = Mock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {"result": "success"}
+        mock_session.get.return_value = mock_response
+
+        # Act
+        result = main.new_function(mock_session, "arg1")
+
+        # Assert
+        assert result == {"result": "success"}
+        mock_session.get.assert_called_once()
+
+    def test_error_handling(self, mock_session):
+        """Test error handling."""
+        mock_response = Mock()
+        mock_response.status_code = 500
+        mock_response.text = "Internal Error"
+        mock_session.get.return_value = mock_response
+
+        with pytest.raises(SystemExit) as exc_info:
+            main.new_function(mock_session, "arg1")
+
+        assert "500" in str(exc_info.value)
+```
+
+### Test Organization Guidelines
+
+1. **Group related tests** in test classes
+2. **Use descriptive names** like `test_handles_rate_limit_on_commits`
+3. **One assertion concept per test** - Test one thing at a time
+4. **Arrange-Act-Assert pattern** - Structure tests clearly
+5. **Add docstrings** to explain what each test verifies
+
+### Mocking Patterns
+
+#### Mocking Time
+
+```python
+@patch("time.time")
+@patch("time.sleep")
+def test_with_time(mock_sleep, mock_time):
+    mock_time.return_value = 1000
+    # Test code
+```
+
+#### Mocking Environment Variables
+
+```python
+with patch.dict(os.environ, {"VAR_NAME": "value"}, clear=True):
+    # Test code
+```
+
+#### Mocking Generators
+
+```python
+mock_extract.return_value = iter([[{"id": 1}], [{"id": 2}]])
+```
+
+### Running Tests During Development
+
+```bash
+# Auto-run tests on file changes (requires pytest-watch)
+pip install pytest-watch
+ptw -- --cov=main -m "not integration"
+```
+
+### Debugging Tests
+
+```bash
+# Drop into debugger on failures
+pytest --pdb
+
+# Show print statements
+pytest -s
+
+# Verbose with full diff
+pytest -vv
+```
+
+### Coverage Tips
+
+If coverage is below 80%:
+
+1. Run `pytest --cov=main --cov-report=term-missing` to see missing lines
+2. Look for untested branches (if/else paths)
+3. Check error handling paths
+4. Verify edge cases are covered
+
+## Resources
+
+- [pytest documentation](https://docs.pytest.org/)
+- [pytest-cov documentation](https://pytest-cov.readthedocs.io/)
+- [unittest.mock documentation](https://docs.python.org/3/library/unittest.mock.html)
+
+## Troubleshooting
+
+### Tests Pass Locally But Fail in CI
+
+- Check Python version (must be 3.11)
+- Verify all dependencies are in `requirements.txt`
+- Look for environment-specific issues
+
+### Coverage Dropped Below 80%
+
+- Run locally: `pytest --cov=main --cov-report=html`
+- Open `htmlcov/index.html` to see uncovered lines
+- Add tests for missing coverage
+
+### Import Errors
+
+- Ensure `PYTHONPATH` includes project root
+- Check that `__init__.py` files exist if needed
+- Verify module names match file names
diff --git a/pytest.ini b/pytest.ini
new file mode 100644
index 0000000..d4a601a
--- /dev/null
+++ b/pytest.ini
@@ -0,0 +1,49 @@
+[pytest]
+# Pytest configuration for GitHub ETL project
+
+# Test discovery patterns
+python_files = test_*.py
+python_classes = Test*
+python_functions = test_*
+
+# Output options
+addopts =
+    -v
+    --strict-markers
+    --tb=short
+    --cov=main
+    --cov-report=term-missing
+    --cov-report=html
+    --cov-branch
+
+# Minimum coverage threshold (can adjust as needed)
+--cov-fail-under=80
+
+# Test paths
+testpaths = .
+
+# Markers for organizing tests
+markers =
+    unit: Unit tests for individual functions
+    integration: Integration tests that test multiple components
+    slow: Tests that take longer to run
+
+# Logging
+log_cli = false
+log_cli_level = INFO
+log_cli_format = %(asctime)s [%(levelname)8s] %(message)s
+log_cli_date_format = %Y-%m-%d %H:%M:%S
+
+# Coverage options
+[coverage:run]
+source = .
+omit =
+    test_*.py
+    .venv/*
+    venv/*
+    */site-packages/*
+
+[coverage:report]
+precision = 2
+show_missing = true
+skip_covered = false
diff --git a/test_main.py b/test_main.py
new file mode 100644
index 0000000..7165677
--- /dev/null
+++ b/test_main.py
@@ -0,0 +1,2106 @@
+#!/usr/bin/env python3
+"""
+Comprehensive test suite for GitHub ETL main.py
+
+This test suite provides complete coverage for all functions in main.py,
+including extraction, transformation, loading, and orchestration logic.
+"""
+
+import logging
+import os
+import sys
+import time
+from datetime import datetime, timezone
+from unittest.mock import Mock, MagicMock, patch, call
+import pytest
+import requests
+from google.cloud import bigquery
+
+import main
+
+
+# =============================================================================
+# FIXTURES
+# =============================================================================
+
+
+@pytest.fixture
+def mock_session():
+    """Provide a mocked requests.Session for testing."""
+    session = Mock(spec=requests.Session)
+    session.headers = {}
+    return session
+
+
+@pytest.fixture
+def mock_bigquery_client():
+    """Provide a mocked BigQuery client for testing."""
+    client = Mock(spec=bigquery.Client)
+    client.project = "test-project"
+    client.insert_rows_json = Mock(return_value=[])
+    return client
+
+
+@pytest.fixture
+def mock_pr_response():
+    """Provide a realistic pull request response for testing."""
+    return {
+        "number": 123,
+        "title": "Bug 1234567 - Fix login issue",
+        "state": "closed",
+        "created_at": "2024-01-01T10:00:00Z",
+        "updated_at": "2024-01-02T10:00:00Z",
+        "merged_at": "2024-01-02T10:00:00Z",
+        "user": {"login": "testuser"},
+        "head": {"ref": "fix-branch"},
+        "base": {"ref": "main"},
+        "labels": [{"name": "bug"}, {"name": "priority-high"}],
+        "commit_data": [],
+        "reviewer_data": [],
+        "comment_data": [],
+    }
+
+
+@pytest.fixture
+def mock_commit_response():
+    """Provide a realistic commit response with files."""
+    return {
+        "sha": "abc123def456",
+        "commit": {
+            "author": {
+                "name": "Test Author",
+                "email": "test@example.com",
+                "date": "2024-01-01T12:00:00Z",
+            }
+        },
+        "files": [
+            {
+                "filename": "src/login.py",
+                "additions": 10,
+                "deletions": 5,
+                "changes": 15,
+            },
+            {
+                "filename": "tests/test_login.py",
+                "additions": 20,
+                "deletions": 2,
+                "changes": 22,
+            },
+        ],
+    }
+
+
+@pytest.fixture
+def mock_reviewer_response():
+    """Provide a realistic reviewer response."""
+    return {
+        "id": 789,
+        "user": {"login": "reviewer1"},
+        "state": "APPROVED",
+        "submitted_at": "2024-01-01T15:00:00Z",
+        "body": "LGTM",
+    }
+
+
+@pytest.fixture
+def mock_comment_response():
+    """Provide a realistic comment response."""
+    return {
+        "id": 456,
+        "user": {"login": "commenter1"},
+        "created_at": "2024-01-01T14:00:00Z",
+        "body": "This looks good to me",
+        "pull_request_review_id": None,
+    }
+
+
+# =============================================================================
+# TEST CLASSES
+# =============================================================================
+
+
+class TestSetupLogging:
+    """Tests for setup_logging function."""
+
+    def test_setup_logging_configures_logger(self):
+        """Test that setup_logging configures the root logger correctly."""
+        main.setup_logging()
+
+        root_logger = logging.getLogger()
+        assert root_logger.level == logging.INFO
+        assert len(root_logger.handlers) > 0
+
+        # Check that at least one handler is a StreamHandler
+        has_stream_handler = any(
+            isinstance(handler, logging.StreamHandler)
+            for handler in root_logger.handlers
+        )
+        assert has_stream_handler
+
+
+class TestSleepForRateLimit:
+    """Tests for sleep_for_rate_limit function."""
+
+    @patch("time.time")
+    @patch("time.sleep")
+    def test_sleep_for_rate_limit_when_remaining_is_zero(
+        self, mock_sleep, mock_time
+    ):
+        """Test that sleep_for_rate_limit sleeps until reset time."""
+        mock_time.return_value = 1000
+
+        mock_response = Mock()
+        mock_response.headers = {
+            "X-RateLimit-Remaining": "0",
+            "X-RateLimit-Reset": "1120",  # 120 seconds from now
+        }
+
+        main.sleep_for_rate_limit(mock_response)
+
+        mock_sleep.assert_called_once_with(120)
+
+    @patch("time.time")
+    @patch("time.sleep")
+    def test_sleep_for_rate_limit_when_reset_already_passed(
+        self, mock_sleep, mock_time
+    ):
+        """Test that sleep_for_rate_limit doesn't sleep negative time."""
+        mock_time.return_value = 2000
+
+        mock_response = Mock()
+        mock_response.headers = {
+            "X-RateLimit-Remaining": "0",
+            "X-RateLimit-Reset": "1500",  # Already passed
+        }
+
+        main.sleep_for_rate_limit(mock_response)
+
+        # Should sleep for 0 seconds (max of 0 and negative value)
+        mock_sleep.assert_called_once_with(0)
+
+    @patch("time.sleep")
+    def test_sleep_for_rate_limit_when_remaining_not_zero(self, mock_sleep):
+        """Test that sleep_for_rate_limit doesn't sleep when remaining > 0."""
+        mock_response = Mock()
+        mock_response.headers = {
+            "X-RateLimit-Remaining": "5",
+            "X-RateLimit-Reset": "1500",
+        }
+
+        main.sleep_for_rate_limit(mock_response)
+
+        # Should not sleep when remaining > 0
+        mock_sleep.assert_not_called()
+
+    @patch("time.sleep")
+    def test_sleep_for_rate_limit_with_missing_headers(self, mock_sleep):
+        """Test sleep_for_rate_limit with missing rate limit headers."""
+        mock_response = Mock()
+        mock_response.headers = {}
+
+        main.sleep_for_rate_limit(mock_response)
+
+        # Should not sleep when headers are missing (defaults to remaining=1)
+        mock_sleep.assert_not_called()
+
+
+class TestExtractPullRequests:
+    """Tests for extract_pull_requests function."""
+
+    def test_extract_single_page(self, mock_session):
+        """Test extracting data from a single page of results."""
+        mock_response = Mock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = [
+            {"number": 1, "title": "PR 1"},
+            {"number": 2, "title": "PR 2"},
+        ]
+        mock_response.links = {}
+
+        mock_session.get.return_value = mock_response
+
+        # Mock the extract functions
+        with patch("main.extract_commits", return_value=[]), patch(
+            "main.extract_reviewers", return_value=[]
+        ), patch("main.extract_comments", return_value=[]):
+            result = list(
+                main.extract_pull_requests(mock_session, "mozilla/firefox")
+            )
+
+        assert len(result) == 1
+        assert len(result[0]) == 2
+        assert result[0][0]["number"] == 1
+        assert result[0][1]["number"] == 2
+
+    def test_extract_multiple_pages(self, mock_session):
+        """Test extracting data across multiple pages with pagination."""
+        # First page response
+        mock_response_1 = Mock()
+        mock_response_1.status_code = 200
+        mock_response_1.json.return_value = [
+            {"number": 1, "title": "PR 1"},
+            {"number": 2, "title": "PR 2"},
+        ]
+        mock_response_1.links = {
+            "next": {
+                "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"
+            }
+        }
+
+        # Second page response
+        mock_response_2 = Mock()
+        mock_response_2.status_code = 200
+        mock_response_2.json.return_value = [{"number": 3, "title": "PR 3"}]
+        mock_response_2.links = {}
+
+        mock_session.get.side_effect = [mock_response_1, mock_response_2]
+
+        with patch("main.extract_commits", return_value=[]), patch(
+            "main.extract_reviewers", return_value=[]
+        ), patch("main.extract_comments", return_value=[]):
+            result = list(
+                main.extract_pull_requests(mock_session, "mozilla/firefox")
+            )
+
+        assert len(result) == 2
+        assert len(result[0]) == 2
+        assert len(result[1]) == 1
+        assert result[0][0]["number"] == 1
+        assert result[1][0]["number"] == 3
+
+    def test_enriches_prs_with_commit_data(self, mock_session):
+        """Test that PRs are enriched with commit data."""
+        mock_response = Mock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
+        mock_response.links = {}
+
+        mock_session.get.return_value = mock_response
+
+        mock_commits = [{"sha": "abc123"}]
+
+        with patch(
+            "main.extract_commits", return_value=mock_commits
+        ) as mock_extract_commits, patch(
+            "main.extract_reviewers", return_value=[]
+        ), patch(
+            "main.extract_comments", return_value=[]
+        ):
+            result = list(
+                main.extract_pull_requests(mock_session, "mozilla/firefox")
+            )
+
+        assert result[0][0]["commit_data"] == mock_commits
+        mock_extract_commits.assert_called_once()
+
+    def test_enriches_prs_with_reviewer_data(self, mock_session):
+        """Test that PRs are enriched with reviewer data."""
+        mock_response = Mock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
+        mock_response.links = {}
+
+        mock_session.get.return_value = mock_response
+
+        mock_reviewers = [{"id": 789, "state": "APPROVED"}]
+
+        with patch("main.extract_commits", return_value=[]), patch(
+            "main.extract_reviewers", return_value=mock_reviewers
+        ) as mock_extract_reviewers, patch(
+            "main.extract_comments", return_value=[]
+        ):
+            result = list(
+                main.extract_pull_requests(mock_session, "mozilla/firefox")
+            )
+
+        assert result[0][0]["reviewer_data"] == mock_reviewers
+        mock_extract_reviewers.assert_called_once()
+
+    def test_enriches_prs_with_comment_data(self, mock_session):
+        """Test that PRs are enriched with comment data."""
+        mock_response = Mock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
+        mock_response.links = {}
+
+        mock_session.get.return_value = mock_response
+
+        mock_comments = [{"id": 456, "body": "Great work!"}]
+
+        with patch("main.extract_commits", return_value=[]), patch(
+            "main.extract_reviewers", return_value=[]
+        ), patch(
+            "main.extract_comments", return_value=mock_comments
+        ) as mock_extract_comments:
+            result = list(
+                main.extract_pull_requests(mock_session, "mozilla/firefox")
+            )
+
+        assert result[0][0]["comment_data"] == mock_comments
+        mock_extract_comments.assert_called_once()
+
+    @patch("main.sleep_for_rate_limit")
+    def test_handles_rate_limit(self, mock_sleep, mock_session):
+        """Test that extract_pull_requests handles rate limiting correctly."""
+        # Rate limit response
+        mock_response_rate_limit = Mock()
+        mock_response_rate_limit.status_code = 403
+        mock_response_rate_limit.headers = {"X-RateLimit-Remaining": "0"}
+
+        # Successful response after rate limit
+        mock_response_success = Mock()
+        mock_response_success.status_code = 200
+        mock_response_success.json.return_value = [
+            {"number": 1, "title": "PR 1"}
+        ]
+        mock_response_success.links = {}
+
+        mock_session.get.side_effect = [
+            mock_response_rate_limit,
+            mock_response_success,
+        ]
+
+        with patch("main.extract_commits", return_value=[]), patch(
+            "main.extract_reviewers", return_value=[]
+        ), patch("main.extract_comments", return_value=[]):
+            result = list(
+                main.extract_pull_requests(mock_session, "mozilla/firefox")
+            )
+
+        mock_sleep.assert_called_once_with(mock_response_rate_limit)
+        assert len(result) == 1
+
+    def test_handles_api_error_404(self, mock_session):
+        """Test that extract_pull_requests raises SystemExit on 404."""
+        mock_response = Mock()
+        mock_response.status_code = 404
+        mock_response.text = "Not Found"
+
+        mock_session.get.return_value = mock_response
+
+        with pytest.raises(SystemExit) as exc_info:
+            list(main.extract_pull_requests(mock_session, "mozilla/nonexistent"))
+
+        assert "GitHub API error 404" in str(exc_info.value)
+
+    def test_handles_api_error_500(self, mock_session):
+        """Test that extract_pull_requests raises SystemExit on 500."""
+        mock_response = Mock()
+        mock_response.status_code = 500
+        mock_response.text = "Internal Server Error"
+
+        mock_session.get.return_value = mock_response
+
+        with pytest.raises(SystemExit) as exc_info:
+            list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+        assert "GitHub API error 500" in str(exc_info.value)
+
+    def test_stops_on_empty_batch(self, mock_session):
+        """Test that extraction stops when an empty batch is returned."""
+        # First page with data
+        mock_response_1 = Mock()
+        mock_response_1.status_code = 200
+        mock_response_1.json.return_value = [{"number": 1}]
+        mock_response_1.links = {
+            "next": {
+                "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"
+            }
+        }
+
+        # Second page empty
+        mock_response_2 = Mock()
+        mock_response_2.status_code = 200
+        mock_response_2.json.return_value = []
+        mock_response_2.links = {}
+
+        mock_session.get.side_effect = [mock_response_1, mock_response_2]
+
+        with patch("main.extract_commits", return_value=[]), patch(
+            "main.extract_reviewers", return_value=[]
+        ), patch("main.extract_comments", return_value=[]):
+            result = list(
+                main.extract_pull_requests(mock_session, "mozilla/firefox")
+            )
+
+        # Should only have 1 chunk from first page
+        assert len(result) == 1
+        assert len(result[0]) == 1
+
+    def test_invalid_page_number_handling(self, mock_session):
+        """Test handling of invalid page number in pagination."""
+        mock_response_1 = Mock()
+        mock_response_1.status_code = 200
+        mock_response_1.json.return_value = [{"number": 1}]
+        mock_response_1.links = {
+            "next": {
+                "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=invalid"
+            }
+        }
+
+        mock_session.get.return_value = mock_response_1
+
+        with patch("main.extract_commits", return_value=[]), patch(
+            "main.extract_reviewers", return_value=[]
+        ), patch("main.extract_comments", return_value=[]):
+            result = list(
+                main.extract_pull_requests(mock_session, "mozilla/firefox")
+            )
+
+        # Should stop pagination on invalid page number
+        assert len(result) == 1
+
+    def test_custom_github_api_url(self, mock_session):
+        """Test using custom GitHub API URL."""
+        custom_url = "https://mock-github.example.com"
+
+        mock_response = Mock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = [{"number": 1}]
+        mock_response.links = {}
+
+        mock_session.get.return_value = mock_response
+
+        with patch("main.extract_commits", return_value=[]), patch(
+            "main.extract_reviewers", return_value=[]
+        ), patch("main.extract_comments", return_value=[]):
+            list(
+                main.extract_pull_requests(
+                    mock_session, "mozilla/firefox", github_api_url=custom_url
+                )
+            )
+
+        # Verify custom URL was used
+        call_args = mock_session.get.call_args
+        assert custom_url in call_args[0][0]
+
+    def test_skips_prs_without_number_field(self, mock_session):
+        """Test that PRs without 'number' field are skipped."""
+        mock_response = Mock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = [
+            {"number": 1, "title": "PR 1"},
+            {"title": "PR without number"},  # Missing number field
+            {"number": 2, "title": "PR 2"},
+        ]
+        mock_response.links = {}
+
+        mock_session.get.return_value = mock_response
+
+        with patch("main.extract_commits", return_value=[]) as mock_commits, patch(
+            "main.extract_reviewers", return_value=[]
+        ), patch("main.extract_comments", return_value=[]):
+            result = list(
+                main.extract_pull_requests(mock_session, "mozilla/firefox")
+            )
+
+        # extract_commits should only be called for PRs with number field
+        assert mock_commits.call_count == 2
+
+
+class TestExtractCommits:
+    """Tests for extract_commits function."""
+
+    def test_fetch_commits_with_files(self, mock_session):
+        """Test fetching commits with files for a PR."""
+        # Mock commits list response
+        commits_response = Mock()
+        commits_response.status_code = 200
+        commits_response.json.return_value = [
+            {"sha": "abc123"},
+            {"sha": "def456"},
+        ]
+
+        # Mock individual commit responses
+        commit_detail_1 = Mock()
+        commit_detail_1.status_code = 200
+        commit_detail_1.json.return_value = {
+            "sha": "abc123",
+            "files": [{"filename": "file1.py", "additions": 10}],
+        }
+
+        commit_detail_2 = Mock()
+        commit_detail_2.status_code = 200
+        commit_detail_2.json.return_value = {
+            "sha": "def456",
+            "files": [{"filename": "file2.py", "deletions": 5}],
+        }
+
+        mock_session.get.side_effect = [
+            commits_response,
+            commit_detail_1,
+            commit_detail_2,
+        ]
+
+        result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+        assert len(result) == 2
+        assert result[0]["sha"] == "abc123"
+        assert result[0]["files"][0]["filename"] == "file1.py"
+        assert result[1]["sha"] == "def456"
+        assert result[1]["files"][0]["filename"] == "file2.py"
+
+    def test_multiple_files_per_commit(self, mock_session):
+        """Test handling multiple files in a single commit."""
+        commits_response = Mock()
+        commits_response.status_code = 200
+        commits_response.json.return_value = [{"sha": "abc123"}]
+
+        commit_detail = Mock()
+        commit_detail.status_code = 200
+        commit_detail.json.return_value = {
+            "sha": "abc123",
+            "files": [
+                {"filename": "file1.py", "additions": 10},
+                {"filename": "file2.py", "additions": 20},
+                {"filename": "file3.py", "deletions": 5},
+            ],
+        }
+
+        mock_session.get.side_effect = [commits_response, commit_detail]
+
+        result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+        assert len(result) == 1
+        assert len(result[0]["files"]) == 3
+
+    @patch("main.sleep_for_rate_limit")
+    def test_rate_limit_on_commits_list(self, mock_sleep, mock_session):
+        """Test rate limit handling when fetching commits list."""
+        # Rate limit response
+        rate_limit_response = Mock()
+        rate_limit_response.status_code = 403
+        rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
+
+        # Success response
+        success_response = Mock()
+        success_response.status_code = 200
+        success_response.json.return_value = []
+
+        mock_session.get.side_effect = [rate_limit_response, success_response]
+
+        result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+        mock_sleep.assert_called_once()
+        assert result == []
+
+    def test_api_error_on_commits_list(self, mock_session):
+        """Test API error handling when fetching commits list."""
+        error_response = Mock()
+        error_response.status_code = 500
+        error_response.text = "Internal Server Error"
+
+        mock_session.get.return_value = error_response
+
+        with pytest.raises(SystemExit) as exc_info:
+            main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+        assert "GitHub API error 500" in str(exc_info.value)
+
+    def test_api_error_on_individual_commit(self, mock_session):
+        """Test API error when fetching individual commit details."""
+        commits_response = Mock()
+        commits_response.status_code = 200
+        commits_response.json.return_value = [{"sha": "abc123"}]
+
+        commit_error = Mock()
+        commit_error.status_code = 404
+        commit_error.text = "Commit not found"
+
+        mock_session.get.side_effect = [commits_response, commit_error]
+
+        with pytest.raises(SystemExit) as exc_info:
+            main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+        assert "GitHub API error 404" in str(exc_info.value)
+
+    def test_commit_without_sha_field(self, mock_session):
+        """Test handling commits without sha field."""
+        commits_response = Mock()
+        commits_response.status_code = 200
+        commits_response.json.return_value = [
+            {"sha": "abc123"},
+            {},  # Missing sha field
+        ]
+
+        commit_detail_1 = Mock()
+        commit_detail_1.status_code = 200
+        commit_detail_1.json.return_value = {"sha": "abc123", "files": []}
+
+        commit_detail_2 = Mock()
+        commit_detail_2.status_code = 200
+        commit_detail_2.json.return_value = {"files": []}
+
+        mock_session.get.side_effect = [commits_response, commit_detail_1, commit_detail_2]
+
+        result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+        # Should handle the commit without sha gracefully
+        assert len(result) == 2
+
+    def test_custom_github_api_url(self, mock_session):
+        """Test using custom GitHub API URL for commits."""
+        custom_url = "https://mock-github.example.com"
+
+        commits_response = Mock()
+        commits_response.status_code = 200
+        commits_response.json.return_value = []
+
+        mock_session.get.return_value = commits_response
+
+        main.extract_commits(
+            mock_session, "mozilla/firefox", 123, github_api_url=custom_url
+        )
+
+        call_args = mock_session.get.call_args
+        assert custom_url in call_args[0][0]
+
+    def test_empty_commits_list(self, mock_session):
+        """Test handling PR with no commits."""
+        commits_response = Mock()
+        commits_response.status_code = 200
+        commits_response.json.return_value = []
+
+        mock_session.get.return_value = commits_response
+
+        result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+        assert result == []
+
+
+class TestExtractReviewers:
+    """Tests for extract_reviewers function."""
+
+    def test_fetch_reviewers(self, mock_session):
+        """Test fetching reviewers for a PR."""
+        reviewers_response = Mock()
+        reviewers_response.status_code = 200
+        reviewers_response.json.return_value = [
+            {
+                "id": 789,
+                "user": {"login": "reviewer1"},
+                "state": "APPROVED",
+                "submitted_at": "2024-01-01T15:00:00Z",
+            },
+            {
+                "id": 790,
+                "user": {"login": "reviewer2"},
+                "state": "CHANGES_REQUESTED",
+                "submitted_at": "2024-01-01T16:00:00Z",
+            },
+        ]
+
+        mock_session.get.return_value = reviewers_response
+
+        result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+
+        assert len(result) == 2
+        assert result[0]["state"] == "APPROVED"
+        assert result[1]["state"] == "CHANGES_REQUESTED"
+
+    def test_multiple_review_states(self, mock_session):
+        """Test handling multiple different review states."""
+        reviewers_response = Mock()
+        reviewers_response.status_code = 200
+        reviewers_response.json.return_value = [
+            {"id": 1, "state": "APPROVED", "user": {"login": "user1"}},
+            {"id": 2, "state": "CHANGES_REQUESTED", "user": {"login": "user2"}},
+            {"id": 3, "state": "COMMENTED", "user": {"login": "user3"}},
+            {"id": 4, "state": "DISMISSED", "user": {"login": "user4"}},
+        ]
+
+        mock_session.get.return_value = reviewers_response
+
+        result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+
+        assert len(result) == 4
+        states = [r["state"] for r in result]
+        assert "APPROVED" in states
+        assert "CHANGES_REQUESTED" in states
+        assert "COMMENTED" in states
+
+    def test_empty_reviewers_list(self, mock_session):
+        """Test handling PR with no reviewers."""
+        reviewers_response = Mock()
+        reviewers_response.status_code = 200
+        reviewers_response.json.return_value = []
+
+        mock_session.get.return_value = reviewers_response
+
+        result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+
+        assert result == []
+
+    @patch("main.sleep_for_rate_limit")
+    def test_rate_limit_handling(self, mock_sleep, mock_session):
+        """Test rate limit handling when fetching reviewers."""
+        rate_limit_response = Mock()
+        rate_limit_response.status_code = 403
+        rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
+
+        success_response = Mock()
+        success_response.status_code = 200
+        success_response.json.return_value = []
+
+        mock_session.get.side_effect = [rate_limit_response, success_response]
+
+        result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+
+        mock_sleep.assert_called_once()
+        assert result == []
+
+    def test_api_error(self, mock_session):
+        """Test API error handling when fetching reviewers."""
+        error_response = Mock()
+        error_response.status_code = 500
+        error_response.text = "Internal Server Error"
+
+        mock_session.get.return_value = error_response
+
+        with pytest.raises(SystemExit) as exc_info:
+            main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+
+        assert "GitHub API error 500" in str(exc_info.value)
+
+    def test_custom_github_api_url(self, mock_session):
+        """Test using custom GitHub API URL for reviewers."""
+        custom_url = "https://mock-github.example.com"
+
+        reviewers_response = Mock()
+        reviewers_response.status_code = 200
+        reviewers_response.json.return_value = []
+
+        mock_session.get.return_value = reviewers_response
+
+        main.extract_reviewers(
+            mock_session, "mozilla/firefox", 123, github_api_url=custom_url
+        )
+
+        call_args = mock_session.get.call_args
+        assert custom_url in call_args[0][0]
+
+
+class TestExtractComments:
+    """Tests for extract_comments function."""
+
+    def test_fetch_comments(self, mock_session):
+        """Test fetching comments for a PR."""
+        comments_response = Mock()
+        comments_response.status_code = 200
+        comments_response.json.return_value = [
+            {
+                "id": 456,
+                "user": {"login": "commenter1"},
+                "body": "This looks good",
+                "created_at": "2024-01-01T14:00:00Z",
+            },
+            {
+                "id": 457,
+                "user": {"login": "commenter2"},
+                "body": "I have concerns",
+                "created_at": "2024-01-01T15:00:00Z",
+            },
+        ]
+
+        mock_session.get.return_value = comments_response
+
+        result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+        assert len(result) == 2
+        assert result[0]["id"] == 456
+        assert result[1]["id"] == 457
+
+    def test_uses_issues_endpoint(self, mock_session):
+        """Test that comments use /issues endpoint not /pulls."""
+        comments_response = Mock()
+        comments_response.status_code = 200
+        comments_response.json.return_value = []
+
+        mock_session.get.return_value = comments_response
+
+        main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+        call_args = mock_session.get.call_args
+        url = call_args[0][0]
+        assert "/issues/123/comments" in url
+        assert "/pulls/123/comments" not in url
+
+    def test_multiple_comments(self, mock_session):
+        """Test handling multiple comments."""
+        comments_response = Mock()
+        comments_response.status_code = 200
+        comments_response.json.return_value = [
+            {"id": i, "user": {"login": f"user{i}"}, "body": f"Comment {i}"}
+            for i in range(1, 11)
+        ]
+
+        mock_session.get.return_value = comments_response
+
+        result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+        assert len(result) == 10
+
+    def test_empty_comments_list(self, mock_session):
+        """Test handling PR with no comments."""
+        comments_response = Mock()
+        comments_response.status_code = 200
+        comments_response.json.return_value = []
+
+        mock_session.get.return_value = comments_response
+
+        result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+        assert result == []
+
+    @patch("main.sleep_for_rate_limit")
+    def test_rate_limit_handling(self, mock_sleep, mock_session):
+        """Test rate limit handling when fetching comments."""
+        rate_limit_response = Mock()
+        rate_limit_response.status_code = 403
+        rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
+
+        success_response = Mock()
+        success_response.status_code = 200
+        success_response.json.return_value = []
+
+        mock_session.get.side_effect = [rate_limit_response, success_response]
+
+        result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+        mock_sleep.assert_called_once()
+        assert result == []
+
+    def test_api_error(self, mock_session):
+        """Test API error handling when fetching comments."""
+        error_response = Mock()
+        error_response.status_code = 404
+        error_response.text = "Not Found"
+
+        mock_session.get.return_value = error_response
+
+        with pytest.raises(SystemExit) as exc_info:
+            main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+        assert "GitHub API error 404" in str(exc_info.value)
+
+    def test_custom_github_api_url(self, mock_session):
+        """Test using custom GitHub API URL for comments."""
+        custom_url = "https://mock-github.example.com"
+
+        comments_response = Mock()
+        comments_response.status_code = 200
+        comments_response.json.return_value = []
+
+        mock_session.get.return_value = comments_response
+
+        main.extract_comments(
+            mock_session, "mozilla/firefox", 123, github_api_url=custom_url
+        )
+
+        call_args = mock_session.get.call_args
+        assert custom_url in call_args[0][0]
+
+
+class TestTransformData:
+    """Tests for transform_data function."""
+
+    def test_basic_pr_transformation(self):
+        """Test basic pull request field mapping."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "Fix login bug",
+                "state": "closed",
+                "created_at": "2024-01-01T10:00:00Z",
+                "updated_at": "2024-01-02T10:00:00Z",
+                "merged_at": "2024-01-02T12:00:00Z",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        assert len(result["pull_requests"]) == 1
+        pr = result["pull_requests"][0]
+        assert pr["pull_request_id"] == 123
+        assert pr["current_status"] == "closed"
+        assert pr["date_created"] == "2024-01-01T10:00:00Z"
+        assert pr["date_modified"] == "2024-01-02T10:00:00Z"
+        assert pr["date_landed"] == "2024-01-02T12:00:00Z"
+        assert pr["target_repository"] == "mozilla/firefox"
+
+    def test_bug_id_extraction_basic(self):
+        """Test bug ID extraction from PR title."""
+        test_cases = [
+            ("Bug 1234567 - Fix issue", 1234567),
+            ("bug 1234567: Update code", 1234567),
+            ("Fix for bug 7654321", 7654321),
+            ("b=9876543 - Change behavior", 9876543),
+        ]
+
+        for title, expected_bug_id in test_cases:
+            raw_data = [
+                {
+                    "number": 1,
+                    "title": title,
+                    "state": "open",
+                    "labels": [],
+                    "commit_data": [],
+                    "reviewer_data": [],
+                    "comment_data": [],
+                }
+            ]
+
+            result = main.transform_data(raw_data, "mozilla/firefox")
+            assert result["pull_requests"][0]["bug_id"] == expected_bug_id
+
+    def test_bug_id_extraction_with_hash(self):
+        """Test bug ID extraction with # symbol."""
+        raw_data = [
+            {
+                "number": 1,
+                "title": "Bug #1234567 - Fix issue",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+        assert result["pull_requests"][0]["bug_id"] == 1234567
+
+    def test_bug_id_filter_large_numbers(self):
+        """Test that bug IDs >= 100000000 are filtered out."""
+        raw_data = [
+            {
+                "number": 1,
+                "title": "Bug 999999999 - Invalid bug ID",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+        assert result["pull_requests"][0]["bug_id"] is None
+
+    def test_bug_id_no_match(self):
+        """Test PR title with no bug ID."""
+        raw_data = [
+            {
+                "number": 1,
+                "title": "Update documentation",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+        assert result["pull_requests"][0]["bug_id"] is None
+
+    def test_labels_extraction(self):
+        """Test labels array extraction."""
+        raw_data = [
+            {
+                "number": 1,
+                "title": "PR with labels",
+                "state": "open",
+                "labels": [
+                    {"name": "bug"},
+                    {"name": "priority-high"},
+                    {"name": "needs-review"},
+                ],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+        labels = result["pull_requests"][0]["labels"]
+        assert len(labels) == 3
+        assert "bug" in labels
+        assert "priority-high" in labels
+        assert "needs-review" in labels
+
+    def test_labels_empty_list(self):
+        """Test handling empty labels list."""
+        raw_data = [
+            {
+                "number": 1,
+                "title": "PR without labels",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+        assert result["pull_requests"][0]["labels"] == []
+
+    def test_commit_transformation(self):
+        """Test commit fields mapping."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "PR with commits",
+                "state": "open",
+                "labels": [],
+                "commit_data": [
+                    {
+                        "sha": "abc123",
+                        "commit": {
+                            "author": {
+                                "name": "Test Author",
+                                "date": "2024-01-01T12:00:00Z",
+                            }
+                        },
+                        "files": [
+                            {
+                                "filename": "src/main.py",
+                                "additions": 10,
+                                "deletions": 5,
+                            }
+                        ],
+                    }
+                ],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        assert len(result["commits"]) == 1
+        commit = result["commits"][0]
+        assert commit["pull_request_id"] == 123
+        assert commit["target_repository"] == "mozilla/firefox"
+        assert commit["commit_sha"] == "abc123"
+        assert commit["date_created"] == "2024-01-01T12:00:00Z"
+        assert commit["author_username"] == "Test Author"
+        assert commit["filename"] == "src/main.py"
+        assert commit["lines_added"] == 10
+        assert commit["lines_removed"] == 5
+
+    def test_commit_file_flattening(self):
+        """Test that each file becomes a separate row."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "PR with multiple files",
+                "state": "open",
+                "labels": [],
+                "commit_data": [
+                    {
+                        "sha": "abc123",
+                        "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
+                        "files": [
+                            {"filename": "file1.py", "additions": 10, "deletions": 5},
+                            {"filename": "file2.py", "additions": 20, "deletions": 2},
+                            {"filename": "file3.py", "additions": 5, "deletions": 15},
+                        ],
+                    }
+                ],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        # Should have 3 rows in commits table (one per file)
+        assert len(result["commits"]) == 3
+        filenames = [c["filename"] for c in result["commits"]]
+        assert "file1.py" in filenames
+        assert "file2.py" in filenames
+        assert "file3.py" in filenames
+
+    def test_multiple_commits_with_files(self):
+        """Test multiple commits with multiple files per PR."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "PR with multiple commits",
+                "state": "open",
+                "labels": [],
+                "commit_data": [
+                    {
+                        "sha": "commit1",
+                        "commit": {"author": {"name": "Author1", "date": "2024-01-01"}},
+                        "files": [
+                            {"filename": "file1.py", "additions": 10, "deletions": 0}
+                        ],
+                    },
+                    {
+                        "sha": "commit2",
+                        "commit": {"author": {"name": "Author2", "date": "2024-01-02"}},
+                        "files": [
+                            {"filename": "file2.py", "additions": 5, "deletions": 2},
+                            {"filename": "file3.py", "additions": 8, "deletions": 3},
+                        ],
+                    },
+                ],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        # Should have 3 rows total (1 file from commit1, 2 files from commit2)
+        assert len(result["commits"]) == 3
+        assert result["commits"][0]["commit_sha"] == "commit1"
+        assert result["commits"][1]["commit_sha"] == "commit2"
+        assert result["commits"][2]["commit_sha"] == "commit2"
+
+    def test_reviewer_transformation(self):
+        """Test reviewer fields mapping."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "PR with reviewers",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [
+                    {
+                        "id": 789,
+                        "user": {"login": "reviewer1"},
+                        "state": "APPROVED",
+                        "submitted_at": "2024-01-01T15:00:00Z",
+                    }
+                ],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        assert len(result["reviewers"]) == 1
+        reviewer = result["reviewers"][0]
+        assert reviewer["pull_request_id"] == 123
+        assert reviewer["target_repository"] == "mozilla/firefox"
+        assert reviewer["reviewer_username"] == "reviewer1"
+        assert reviewer["status"] == "APPROVED"
+        assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z"
+
+    def test_multiple_review_states(self):
+        """Test handling multiple review states."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "PR with multiple reviews",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [
+                    {
+                        "id": 1,
+                        "user": {"login": "user1"},
+                        "state": "APPROVED",
+                        "submitted_at": "2024-01-01T15:00:00Z",
+                    },
+                    {
+                        "id": 2,
+                        "user": {"login": "user2"},
+                        "state": "CHANGES_REQUESTED",
+                        "submitted_at": "2024-01-01T16:00:00Z",
+                    },
+                    {
+                        "id": 3,
+                        "user": {"login": "user3"},
+                        "state": "COMMENTED",
+                        "submitted_at": "2024-01-01T17:00:00Z",
+                    },
+                ],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        assert len(result["reviewers"]) == 3
+        states = [r["status"] for r in result["reviewers"]]
+        assert "APPROVED" in states
+        assert "CHANGES_REQUESTED" in states
+        assert "COMMENTED" in states
+
+    def test_date_approved_from_earliest_approval(self):
+        """Test that date_approved is set to earliest APPROVED review."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "PR with multiple approvals",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [
+                    {
+                        "id": 1,
+                        "user": {"login": "user1"},
+                        "state": "APPROVED",
+                        "submitted_at": "2024-01-02T15:00:00Z",
+                    },
+                    {
+                        "id": 2,
+                        "user": {"login": "user2"},
+                        "state": "APPROVED",
+                        "submitted_at": "2024-01-01T14:00:00Z",  # Earliest
+                    },
+                    {
+                        "id": 3,
+                        "user": {"login": "user3"},
+                        "state": "APPROVED",
+                        "submitted_at": "2024-01-03T16:00:00Z",
+                    },
+                ],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        pr = result["pull_requests"][0]
+        assert pr["date_approved"] == "2024-01-01T14:00:00Z"
+
+    def test_comment_transformation(self):
+        """Test comment fields mapping."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "PR with comments",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [
+                    {
+                        "id": 456,
+                        "user": {"login": "commenter1"},
+                        "body": "This looks great!",
+                        "created_at": "2024-01-01T14:00:00Z",
+                        "pull_request_review_id": None,
+                    }
+                ],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        assert len(result["comments"]) == 1
+        comment = result["comments"][0]
+        assert comment["pull_request_id"] == 123
+        assert comment["target_repository"] == "mozilla/firefox"
+        assert comment["comment_id"] == 456
+        assert comment["author_username"] == "commenter1"
+        assert comment["date_created"] == "2024-01-01T14:00:00Z"
+        assert comment["character_count"] == 17
+
+    def test_comment_character_count(self):
+        """Test character count calculation for comments."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "PR",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [
+                    {
+                        "id": 1,
+                        "user": {"login": "user1"},
+                        "body": "Short",
+                        "created_at": "2024-01-01",
+                    },
+                    {
+                        "id": 2,
+                        "user": {"login": "user2"},
+                        "body": "This is a much longer comment with more text",
+                        "created_at": "2024-01-01",
+                    },
+                ],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        assert result["comments"][0]["character_count"] == 5
+        assert result["comments"][1]["character_count"] == 44
+
+    def test_comment_status_from_review(self):
+        """Test that comment status is mapped from review_id_statuses."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "PR",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [
+                    {
+                        "id": 789,
+                        "user": {"login": "reviewer"},
+                        "state": "APPROVED",
+                        "submitted_at": "2024-01-01",
+                    }
+                ],
+                "comment_data": [
+                    {
+                        "id": 456,
+                        "user": {"login": "commenter"},
+                        "body": "LGTM",
+                        "created_at": "2024-01-01",
+                        "pull_request_review_id": 789,
+                    }
+                ],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        # Comment should have status from the review
+        assert result["comments"][0]["status"] == "APPROVED"
+
+    def test_comment_empty_body(self):
+        """Test handling comments with empty or None body."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "PR",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [
+                    {
+                        "id": 1,
+                        "user": {"login": "user1"},
+                        "body": None,
+                        "created_at": "2024-01-01",
+                    },
+                    {
+                        "id": 2,
+                        "user": {"login": "user2"},
+                        "body": "",
+                        "created_at": "2024-01-01",
+                    },
+                ],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        assert result["comments"][0]["character_count"] == 0
+        assert result["comments"][1]["character_count"] == 0
+
+    def test_empty_raw_data(self):
+        """Test handling empty input list."""
+        result = main.transform_data([], "mozilla/firefox")
+
+        assert result["pull_requests"] == []
+        assert result["commits"] == []
+        assert result["reviewers"] == []
+        assert result["comments"] == []
+
+    def test_pr_without_commits_reviewers_comments(self):
+        """Test PR with no commits, reviewers, or comments."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "Minimal PR",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        assert len(result["pull_requests"]) == 1
+        assert len(result["commits"]) == 0
+        assert len(result["reviewers"]) == 0
+        assert len(result["comments"]) == 0
+
+    def test_return_structure(self):
+        """Test that transform_data returns dict with 4 keys."""
+        raw_data = [
+            {
+                "number": 1,
+                "title": "Test",
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        assert isinstance(result, dict)
+        assert "pull_requests" in result
+        assert "commits" in result
+        assert "reviewers" in result
+        assert "comments" in result
+
+    def test_all_tables_have_target_repository(self):
+        """Test that all tables include target_repository field."""
+        raw_data = [
+            {
+                "number": 123,
+                "title": "Test PR",
+                "state": "open",
+                "labels": [],
+                "commit_data": [
+                    {
+                        "sha": "abc",
+                        "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
+                        "files": [{"filename": "test.py", "additions": 1, "deletions": 0}],
+                    }
+                ],
+                "reviewer_data": [
+                    {
+                        "id": 1,
+                        "user": {"login": "reviewer"},
+                        "state": "APPROVED",
+                        "submitted_at": "2024-01-01",
+                    }
+                ],
+                "comment_data": [
+                    {
+                        "id": 2,
+                        "user": {"login": "commenter"},
+                        "body": "Test",
+                        "created_at": "2024-01-01",
+                    }
+                ],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+
+        assert result["pull_requests"][0]["target_repository"] == "mozilla/firefox"
+        assert result["commits"][0]["target_repository"] == "mozilla/firefox"
+        assert result["reviewers"][0]["target_repository"] == "mozilla/firefox"
+        assert result["comments"][0]["target_repository"] == "mozilla/firefox"
+
+
+class TestLoadData:
+    """Tests for load_data function."""
+
+    @patch("main.datetime")
+    def test_load_all_tables(self, mock_datetime, mock_bigquery_client):
+        """Test loading all 4 tables to BigQuery."""
+        mock_datetime.now.return_value.strftime.return_value = "2024-01-15"
+
+        transformed_data = {
+            "pull_requests": [{"pull_request_id": 1}],
+            "commits": [{"commit_sha": "abc"}],
+            "reviewers": [{"reviewer_username": "user1"}],
+            "comments": [{"comment_id": 123}],
+        }
+
+        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+        # Should call insert_rows_json 4 times (once per table)
+        assert mock_bigquery_client.insert_rows_json.call_count == 4
+
+    @patch("main.datetime")
+    def test_adds_snapshot_date(self, mock_datetime, mock_bigquery_client):
+        """Test that snapshot_date is added to all rows."""
+        mock_datetime.now.return_value.strftime.return_value = "2024-01-15"
+
+        transformed_data = {
+            "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}],
+            "commits": [],
+            "reviewers": [],
+            "comments": [],
+        }
+
+        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+        call_args = mock_bigquery_client.insert_rows_json.call_args
+        rows = call_args[0][1]
+        assert all(row["snapshot_date"] == "2024-01-15" for row in rows)
+
+    def test_constructs_correct_table_ref(self, mock_bigquery_client):
+        """Test that table_ref is constructed correctly."""
+        transformed_data = {
+            "pull_requests": [{"pull_request_id": 1}],
+            "commits": [],
+            "reviewers": [],
+            "comments": [],
+        }
+
+        main.load_data(mock_bigquery_client, "my_dataset", transformed_data)
+
+        call_args = mock_bigquery_client.insert_rows_json.call_args
+        table_ref = call_args[0][0]
+        assert table_ref == "test-project.my_dataset.pull_requests"
+
+    def test_empty_transformed_data_skipped(self, mock_bigquery_client):
+        """Test that empty transformed_data dict is skipped."""
+        transformed_data = {}
+
+        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+        mock_bigquery_client.insert_rows_json.assert_not_called()
+
+    def test_skips_empty_tables_individually(self, mock_bigquery_client):
+        """Test that empty tables are skipped individually."""
+        transformed_data = {
+            "pull_requests": [{"pull_request_id": 1}],
+            "commits": [],  # Empty, should be skipped
+            "reviewers": [],  # Empty, should be skipped
+            "comments": [{"comment_id": 456}],
+        }
+
+        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+        # Should only call insert_rows_json twice (for PRs and comments)
+        assert mock_bigquery_client.insert_rows_json.call_count == 2
+
+    def test_only_pull_requests_table(self, mock_bigquery_client):
+        """Test loading only pull_requests table."""
+        transformed_data = {
+            "pull_requests": [{"pull_request_id": 1}],
+            "commits": [],
+            "reviewers": [],
+            "comments": [],
+        }
+
+        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+        assert mock_bigquery_client.insert_rows_json.call_count == 1
+
+    def test_raises_exception_on_insert_errors(self, mock_bigquery_client):
+        """Test that Exception is raised on BigQuery insert errors."""
+        mock_bigquery_client.insert_rows_json.return_value = [
+            {"index": 0, "errors": ["Insert failed"]}
+        ]
+
+        transformed_data = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []}
+
+        with pytest.raises(Exception) as exc_info:
+            main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+        assert "BigQuery insert errors" in str(exc_info.value)
+
+    def test_verifies_client_insert_called_correctly(self, mock_bigquery_client):
+        """Test that client.insert_rows_json is called with correct arguments."""
+        transformed_data = {
+            "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}],
+            "commits": [],
+            "reviewers": [],
+            "comments": [],
+        }
+
+        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+        call_args = mock_bigquery_client.insert_rows_json.call_args
+        table_ref, rows = call_args[0]
+
+        assert "pull_requests" in table_ref
+        assert len(rows) == 2
+
+
+class TestMain:
+    """Tests for main function."""
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_requires_github_repos(
+        self, mock_session_class, mock_bq_client, mock_setup_logging
+    ):
+        """Test that GITHUB_REPOS is required."""
+        with patch.dict(
+            os.environ,
+            {"BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test"},
+            clear=True,
+        ):
+            with pytest.raises(SystemExit) as exc_info:
+                main.main()
+
+            assert "GITHUB_REPOS" in str(exc_info.value)
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_requires_bigquery_project(
+        self, mock_session_class, mock_bq_client, mock_setup_logging
+    ):
+        """Test that BIGQUERY_PROJECT is required."""
+        with patch.dict(
+            os.environ, {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"}, clear=True
+        ):
+            with pytest.raises(SystemExit) as exc_info:
+                main.main()
+
+            assert "BIGQUERY_PROJECT" in str(exc_info.value)
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_requires_bigquery_dataset(
+        self, mock_session_class, mock_bq_client, mock_setup_logging
+    ):
+        """Test that BIGQUERY_DATASET is required."""
+        with patch.dict(
+            os.environ, {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"}, clear=True
+        ):
+            with pytest.raises(SystemExit) as exc_info:
+                main.main()
+
+            assert "BIGQUERY_DATASET" in str(exc_info.value)
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_github_token_optional_with_warning(
+        self, mock_session_class, mock_bq_client, mock_setup_logging
+    ):
+        """Test that GITHUB_TOKEN is optional but warns if missing."""
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+            },
+            clear=True,
+        ), patch("main.extract_pull_requests", return_value=iter([])):
+            # Should not raise, but should log warning
+            result = main.main()
+            assert result == 0
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_splits_github_repos_by_comma(
+        self, mock_session_class, mock_bq_client, mock_setup_logging
+    ):
+        """Test that GITHUB_REPOS is split by comma."""
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ), patch("main.extract_pull_requests", return_value=iter([])) as mock_extract:
+            main.main()
+
+            # Should be called twice (once per repo)
+            assert mock_extract.call_count == 2
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_honors_github_api_url(
+        self, mock_session_class, mock_bq_client, mock_setup_logging
+    ):
+        """Test that GITHUB_API_URL is honored."""
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+                "GITHUB_API_URL": "https://custom-api.example.com",
+            },
+            clear=True,
+        ), patch("main.extract_pull_requests", return_value=iter([])) as mock_extract:
+            main.main()
+
+            call_kwargs = mock_extract.call_args[1]
+            assert call_kwargs["github_api_url"] == "https://custom-api.example.com"
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_honors_bigquery_emulator_host(
+        self, mock_session_class, mock_bq_client_class, mock_setup_logging
+    ):
+        """Test that BIGQUERY_EMULATOR_HOST is honored."""
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+                "BIGQUERY_EMULATOR_HOST": "http://localhost:9050",
+            },
+            clear=True,
+        ), patch("main.extract_pull_requests", return_value=iter([])):
+            main.main()
+
+            # Verify BigQuery client was created with emulator settings
+            mock_bq_client_class.assert_called_once()
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_creates_session_with_headers(
+        self, mock_session_class, mock_bq_client, mock_setup_logging
+    ):
+        """Test that session is created with Accept and User-Agent headers."""
+        mock_session = MagicMock()
+        mock_session_class.return_value = mock_session
+
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ), patch("main.extract_pull_requests", return_value=iter([])):
+            main.main()
+
+            # Verify session headers were set
+            assert mock_session.headers.update.called
+            call_args = mock_session.headers.update.call_args[0][0]
+            assert "Accept" in call_args
+            assert "User-Agent" in call_args
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_sets_authorization_header_with_token(
+        self, mock_session_class, mock_bq_client, mock_setup_logging
+    ):
+        """Test that Authorization header is set when token provided."""
+        mock_session = MagicMock()
+        mock_session_class.return_value = mock_session
+
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "test-token-123",
+            },
+            clear=True,
+        ), patch("main.extract_pull_requests", return_value=iter([])):
+            main.main()
+
+            # Verify Authorization header was set
+            assert mock_session.headers.__setitem__.called
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    @patch("main.extract_pull_requests")
+    @patch("main.transform_data")
+    @patch("main.load_data")
+    def test_single_repo_successful_etl(
+        self,
+        mock_load,
+        mock_transform,
+        mock_extract,
+        mock_session_class,
+        mock_bq_client,
+        mock_setup_logging,
+    ):
+        """Test successful ETL for single repository."""
+        mock_extract.return_value = iter([[{"number": 1}]])
+        mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []}
+
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ):
+            result = main.main()
+
+        assert result == 0
+        mock_extract.assert_called_once()
+        mock_transform.assert_called_once()
+        mock_load.assert_called_once()
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    @patch("main.extract_pull_requests")
+    @patch("main.transform_data")
+    @patch("main.load_data")
+    def test_multiple_repos_processing(
+        self,
+        mock_load,
+        mock_transform,
+        mock_extract,
+        mock_session_class,
+        mock_bq_client,
+        mock_setup_logging,
+    ):
+        """Test processing multiple repositories."""
+        mock_extract.return_value = iter([[{"number": 1}]])
+        mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []}
+
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev,mozilla/addons",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ):
+            result = main.main()
+
+        assert result == 0
+        # Should process 3 repositories
+        assert mock_extract.call_count == 3
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    @patch("main.extract_pull_requests")
+    @patch("main.transform_data")
+    @patch("main.load_data")
+    def test_processes_chunks_iteratively(
+        self,
+        mock_load,
+        mock_transform,
+        mock_extract,
+        mock_session_class,
+        mock_bq_client,
+        mock_setup_logging,
+    ):
+        """Test that chunks are processed iteratively from generator."""
+        # Return 3 chunks
+        mock_extract.return_value = iter([
+            [{"number": 1}],
+            [{"number": 2}],
+            [{"number": 3}],
+        ])
+        mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []}
+
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ):
+            result = main.main()
+
+        assert result == 0
+        # Transform and load should be called 3 times (once per chunk)
+        assert mock_transform.call_count == 3
+        assert mock_load.call_count == 3
+
+    @patch("main.setup_logging")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_returns_zero_on_success(
+        self, mock_session_class, mock_bq_client, mock_setup_logging
+    ):
+        """Test that main returns 0 on success."""
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ), patch("main.extract_pull_requests", return_value=iter([])):
+            result = main.main()
+
+        assert result == 0
+
+
+@pytest.mark.integration
+class TestIntegration:
+    """Integration tests that test multiple components together."""
+
+    @patch("main.setup_logging")
+    @patch("main.load_data")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_end_to_end_with_mocked_github(
+        self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+    ):
+        """Test end-to-end flow with mocked GitHub responses."""
+        mock_session = MagicMock()
+        mock_session_class.return_value = mock_session
+
+        # Mock PR response
+        pr_response = Mock()
+        pr_response.status_code = 200
+        pr_response.json.return_value = [
+            {"number": 1, "title": "Bug 1234567 - Test PR", "state": "open"}
+        ]
+        pr_response.links = {}
+
+        # Mock commits, reviewers, comments responses
+        empty_response = Mock()
+        empty_response.status_code = 200
+        empty_response.json.return_value = []
+
+        mock_session.get.side_effect = [
+            pr_response,
+            empty_response,
+            empty_response,
+            empty_response,
+        ]
+
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ):
+            result = main.main()
+
+        assert result == 0
+        mock_load.assert_called_once()
+
+        # Verify transformed data structure
+        call_args = mock_load.call_args[0]
+        transformed_data = call_args[2]
+        assert "pull_requests" in transformed_data
+        assert len(transformed_data["pull_requests"]) == 1
+
+    @patch("main.setup_logging")
+    @patch("main.load_data")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_bug_id_extraction_through_pipeline(
+        self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+    ):
+        """Test bug ID extraction through full pipeline."""
+        mock_session = MagicMock()
+        mock_session_class.return_value = mock_session
+
+        pr_response = Mock()
+        pr_response.status_code = 200
+        pr_response.json.return_value = [
+            {"number": 1, "title": "Bug 9876543 - Fix critical issue", "state": "closed"}
+        ]
+        pr_response.links = {}
+
+        empty_response = Mock()
+        empty_response.status_code = 200
+        empty_response.json.return_value = []
+
+        mock_session.get.side_effect = [
+            pr_response,
+            empty_response,
+            empty_response,
+            empty_response,
+        ]
+
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ):
+            main.main()
+
+        call_args = mock_load.call_args[0]
+        transformed_data = call_args[2]
+        pr = transformed_data["pull_requests"][0]
+        assert pr["bug_id"] == 9876543
+
+    @patch("main.setup_logging")
+    @patch("main.load_data")
+    @patch("main.bigquery.Client")
+    @patch("requests.Session")
+    def test_pagination_through_full_flow(
+        self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+    ):
+        """Test pagination through full ETL flow."""
+        mock_session = MagicMock()
+        mock_session_class.return_value = mock_session
+
+        # First page
+        pr_response_1 = Mock()
+        pr_response_1.status_code = 200
+        pr_response_1.json.return_value = [
+            {"number": 1, "title": "PR 1", "state": "open"}
+        ]
+        pr_response_1.links = {
+            "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
+        }
+
+        # Second page
+        pr_response_2 = Mock()
+        pr_response_2.status_code = 200
+        pr_response_2.json.return_value = [
+            {"number": 2, "title": "PR 2", "state": "open"}
+        ]
+        pr_response_2.links = {}
+
+        empty_response = Mock()
+        empty_response.status_code = 200
+        empty_response.json.return_value = []
+
+        mock_session.get.side_effect = [
+            pr_response_1,
+            empty_response,
+            empty_response,
+            empty_response,
+            pr_response_2,
+            empty_response,
+            empty_response,
+            empty_response,
+        ]
+
+        with patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ):
+            main.main()
+
+        # Should be called twice (once per chunk/page)
+        assert mock_load.call_count == 2

From d6cb74c01067c5696fb3db9307bdeb71416bea0a Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Wed, 21 Jan 2026 18:47:57 -0500
Subject: [PATCH 02/11] Copilot suggested fixes

---
 TESTING.md   |   3 +-
 pytest.ini   |   4 +-
 test_main.py | 374 +++++++++++++++++++++++++++++----------------------
 3 files changed, 213 insertions(+), 168 deletions(-)

diff --git a/TESTING.md b/TESTING.md
index c0bb5dd..104d401 100644
--- a/TESTING.md
+++ b/TESTING.md
@@ -228,7 +228,7 @@ mypy main.py --no-strict-optional --ignore-missing-imports
 
 ### GitHub Actions Workflow
 
-The `.github/workflows/tests.yml` workflow runs on every push and pull request:
+The `.github/workflows/tests.yml` workflow runs on every pull request:
 
 **Lint Job:**
 1. Runs black (format check)
@@ -243,7 +243,6 @@ The `.github/workflows/tests.yml` workflow runs on every push and pull request:
 
 ### Workflow Triggers
 
-- Push to `main` or `unit-tests` branch
 - Pull requests to `main` branch
 
 ### Viewing Results
diff --git a/pytest.ini b/pytest.ini
index d4a601a..33ef84b 100644
--- a/pytest.ini
+++ b/pytest.ini
@@ -15,9 +15,7 @@ addopts =
     --cov-report=term-missing
     --cov-report=html
     --cov-branch
-
-# Minimum coverage threshold (can adjust as needed)
---cov-fail-under=80
+    --cov-fail-under=80
 
 # Test paths
 testpaths = .
diff --git a/test_main.py b/test_main.py
index 7165677..400c6d3 100644
--- a/test_main.py
+++ b/test_main.py
@@ -8,10 +8,9 @@
 
 import logging
 import os
-import sys
 import time
-from datetime import datetime, timezone
-from unittest.mock import Mock, MagicMock, patch, call
+from datetime import datetime
+from unittest.mock import Mock, MagicMock, patch
 import pytest
 import requests
 from google.cloud import bigquery
@@ -143,9 +142,7 @@ class TestSleepForRateLimit:
 
     @patch("time.time")
     @patch("time.sleep")
-    def test_sleep_for_rate_limit_when_remaining_is_zero(
-        self, mock_sleep, mock_time
-    ):
+    def test_sleep_for_rate_limit_when_remaining_is_zero(self, mock_sleep, mock_time):
         """Test that sleep_for_rate_limit sleeps until reset time."""
         mock_time.return_value = 1000
 
@@ -220,12 +217,12 @@ def test_extract_single_page(self, mock_session):
         mock_session.get.return_value = mock_response
 
         # Mock the extract functions
-        with patch("main.extract_commits", return_value=[]), patch(
-            "main.extract_reviewers", return_value=[]
-        ), patch("main.extract_comments", return_value=[]):
-            result = list(
-                main.extract_pull_requests(mock_session, "mozilla/firefox")
-            )
+        with (
+            patch("main.extract_commits", return_value=[]),
+            patch("main.extract_reviewers", return_value=[]),
+            patch("main.extract_comments", return_value=[]),
+        ):
+            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
         assert len(result) == 1
         assert len(result[0]) == 2
@@ -242,9 +239,7 @@ def test_extract_multiple_pages(self, mock_session):
             {"number": 2, "title": "PR 2"},
         ]
         mock_response_1.links = {
-            "next": {
-                "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"
-            }
+            "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
         }
 
         # Second page response
@@ -255,12 +250,12 @@ def test_extract_multiple_pages(self, mock_session):
 
         mock_session.get.side_effect = [mock_response_1, mock_response_2]
 
-        with patch("main.extract_commits", return_value=[]), patch(
-            "main.extract_reviewers", return_value=[]
-        ), patch("main.extract_comments", return_value=[]):
-            result = list(
-                main.extract_pull_requests(mock_session, "mozilla/firefox")
-            )
+        with (
+            patch("main.extract_commits", return_value=[]),
+            patch("main.extract_reviewers", return_value=[]),
+            patch("main.extract_comments", return_value=[]),
+        ):
+            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
         assert len(result) == 2
         assert len(result[0]) == 2
@@ -279,16 +274,14 @@ def test_enriches_prs_with_commit_data(self, mock_session):
 
         mock_commits = [{"sha": "abc123"}]
 
-        with patch(
-            "main.extract_commits", return_value=mock_commits
-        ) as mock_extract_commits, patch(
-            "main.extract_reviewers", return_value=[]
-        ), patch(
-            "main.extract_comments", return_value=[]
+        with (
+            patch(
+                "main.extract_commits", return_value=mock_commits
+            ) as mock_extract_commits,
+            patch("main.extract_reviewers", return_value=[]),
+            patch("main.extract_comments", return_value=[]),
         ):
-            result = list(
-                main.extract_pull_requests(mock_session, "mozilla/firefox")
-            )
+            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
         assert result[0][0]["commit_data"] == mock_commits
         mock_extract_commits.assert_called_once()
@@ -304,14 +297,14 @@ def test_enriches_prs_with_reviewer_data(self, mock_session):
 
         mock_reviewers = [{"id": 789, "state": "APPROVED"}]
 
-        with patch("main.extract_commits", return_value=[]), patch(
-            "main.extract_reviewers", return_value=mock_reviewers
-        ) as mock_extract_reviewers, patch(
-            "main.extract_comments", return_value=[]
+        with (
+            patch("main.extract_commits", return_value=[]),
+            patch(
+                "main.extract_reviewers", return_value=mock_reviewers
+            ) as mock_extract_reviewers,
+            patch("main.extract_comments", return_value=[]),
         ):
-            result = list(
-                main.extract_pull_requests(mock_session, "mozilla/firefox")
-            )
+            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
         assert result[0][0]["reviewer_data"] == mock_reviewers
         mock_extract_reviewers.assert_called_once()
@@ -327,14 +320,14 @@ def test_enriches_prs_with_comment_data(self, mock_session):
 
         mock_comments = [{"id": 456, "body": "Great work!"}]
 
-        with patch("main.extract_commits", return_value=[]), patch(
-            "main.extract_reviewers", return_value=[]
-        ), patch(
-            "main.extract_comments", return_value=mock_comments
-        ) as mock_extract_comments:
-            result = list(
-                main.extract_pull_requests(mock_session, "mozilla/firefox")
-            )
+        with (
+            patch("main.extract_commits", return_value=[]),
+            patch("main.extract_reviewers", return_value=[]),
+            patch(
+                "main.extract_comments", return_value=mock_comments
+            ) as mock_extract_comments,
+        ):
+            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
         assert result[0][0]["comment_data"] == mock_comments
         mock_extract_comments.assert_called_once()
@@ -350,9 +343,7 @@ def test_handles_rate_limit(self, mock_sleep, mock_session):
         # Successful response after rate limit
         mock_response_success = Mock()
         mock_response_success.status_code = 200
-        mock_response_success.json.return_value = [
-            {"number": 1, "title": "PR 1"}
-        ]
+        mock_response_success.json.return_value = [{"number": 1, "title": "PR 1"}]
         mock_response_success.links = {}
 
         mock_session.get.side_effect = [
@@ -360,12 +351,12 @@ def test_handles_rate_limit(self, mock_sleep, mock_session):
             mock_response_success,
         ]
 
-        with patch("main.extract_commits", return_value=[]), patch(
-            "main.extract_reviewers", return_value=[]
-        ), patch("main.extract_comments", return_value=[]):
-            result = list(
-                main.extract_pull_requests(mock_session, "mozilla/firefox")
-            )
+        with (
+            patch("main.extract_commits", return_value=[]),
+            patch("main.extract_reviewers", return_value=[]),
+            patch("main.extract_comments", return_value=[]),
+        ):
+            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
         mock_sleep.assert_called_once_with(mock_response_rate_limit)
         assert len(result) == 1
@@ -403,9 +394,7 @@ def test_stops_on_empty_batch(self, mock_session):
         mock_response_1.status_code = 200
         mock_response_1.json.return_value = [{"number": 1}]
         mock_response_1.links = {
-            "next": {
-                "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"
-            }
+            "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
         }
 
         # Second page empty
@@ -416,12 +405,12 @@ def test_stops_on_empty_batch(self, mock_session):
 
         mock_session.get.side_effect = [mock_response_1, mock_response_2]
 
-        with patch("main.extract_commits", return_value=[]), patch(
-            "main.extract_reviewers", return_value=[]
-        ), patch("main.extract_comments", return_value=[]):
-            result = list(
-                main.extract_pull_requests(mock_session, "mozilla/firefox")
-            )
+        with (
+            patch("main.extract_commits", return_value=[]),
+            patch("main.extract_reviewers", return_value=[]),
+            patch("main.extract_comments", return_value=[]),
+        ):
+            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
         # Should only have 1 chunk from first page
         assert len(result) == 1
@@ -440,12 +429,12 @@ def test_invalid_page_number_handling(self, mock_session):
 
         mock_session.get.return_value = mock_response_1
 
-        with patch("main.extract_commits", return_value=[]), patch(
-            "main.extract_reviewers", return_value=[]
-        ), patch("main.extract_comments", return_value=[]):
-            result = list(
-                main.extract_pull_requests(mock_session, "mozilla/firefox")
-            )
+        with (
+            patch("main.extract_commits", return_value=[]),
+            patch("main.extract_reviewers", return_value=[]),
+            patch("main.extract_comments", return_value=[]),
+        ):
+            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
         # Should stop pagination on invalid page number
         assert len(result) == 1
@@ -461,9 +450,11 @@ def test_custom_github_api_url(self, mock_session):
 
         mock_session.get.return_value = mock_response
 
-        with patch("main.extract_commits", return_value=[]), patch(
-            "main.extract_reviewers", return_value=[]
-        ), patch("main.extract_comments", return_value=[]):
+        with (
+            patch("main.extract_commits", return_value=[]),
+            patch("main.extract_reviewers", return_value=[]),
+            patch("main.extract_comments", return_value=[]),
+        ):
             list(
                 main.extract_pull_requests(
                     mock_session, "mozilla/firefox", github_api_url=custom_url
@@ -487,12 +478,12 @@ def test_skips_prs_without_number_field(self, mock_session):
 
         mock_session.get.return_value = mock_response
 
-        with patch("main.extract_commits", return_value=[]) as mock_commits, patch(
-            "main.extract_reviewers", return_value=[]
-        ), patch("main.extract_comments", return_value=[]):
-            result = list(
-                main.extract_pull_requests(mock_session, "mozilla/firefox")
-            )
+        with (
+            patch("main.extract_commits", return_value=[]) as mock_commits,
+            patch("main.extract_reviewers", return_value=[]),
+            patch("main.extract_comments", return_value=[]),
+        ):
+            list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
         # extract_commits should only be called for PRs with number field
         assert mock_commits.call_count == 2
@@ -631,7 +622,11 @@ def test_commit_without_sha_field(self, mock_session):
         commit_detail_2.status_code = 200
         commit_detail_2.json.return_value = {"files": []}
 
-        mock_session.get.side_effect = [commits_response, commit_detail_1, commit_detail_2]
+        mock_session.get.side_effect = [
+            commits_response,
+            commit_detail_1,
+            commit_detail_2,
+        ]
 
         result = main.extract_commits(mock_session, "mozilla/firefox", 123)
 
@@ -1470,7 +1465,9 @@ def test_all_tables_have_target_repository(self):
                     {
                         "sha": "abc",
                         "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
-                        "files": [{"filename": "test.py", "additions": 1, "deletions": 0}],
+                        "files": [
+                            {"filename": "test.py", "additions": 1, "deletions": 0}
+                        ],
                     }
                 ],
                 "reviewer_data": [
@@ -1594,7 +1591,12 @@ def test_raises_exception_on_insert_errors(self, mock_bigquery_client):
             {"index": 0, "errors": ["Insert failed"]}
         ]
 
-        transformed_data = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []}
+        transformed_data = {
+            "pull_requests": [{"pull_request_id": 1}],
+            "commits": [],
+            "reviewers": [],
+            "comments": [],
+        }
 
         with pytest.raises(Exception) as exc_info:
             main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
@@ -1647,7 +1649,9 @@ def test_requires_bigquery_project(
     ):
         """Test that BIGQUERY_PROJECT is required."""
         with patch.dict(
-            os.environ, {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"}, clear=True
+            os.environ,
+            {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"},
+            clear=True,
         ):
             with pytest.raises(SystemExit) as exc_info:
                 main.main()
@@ -1662,7 +1666,9 @@ def test_requires_bigquery_dataset(
     ):
         """Test that BIGQUERY_DATASET is required."""
         with patch.dict(
-            os.environ, {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"}, clear=True
+            os.environ,
+            {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"},
+            clear=True,
         ):
             with pytest.raises(SystemExit) as exc_info:
                 main.main()
@@ -1676,15 +1682,18 @@ def test_github_token_optional_with_warning(
         self, mock_session_class, mock_bq_client, mock_setup_logging
     ):
         """Test that GITHUB_TOKEN is optional but warns if missing."""
-        with patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-            },
-            clear=True,
-        ), patch("main.extract_pull_requests", return_value=iter([])):
+        with (
+            patch.dict(
+                os.environ,
+                {
+                    "GITHUB_REPOS": "mozilla/firefox",
+                    "BIGQUERY_PROJECT": "test",
+                    "BIGQUERY_DATASET": "test",
+                },
+                clear=True,
+            ),
+            patch("main.extract_pull_requests", return_value=iter([])),
+        ):
             # Should not raise, but should log warning
             result = main.main()
             assert result == 0
@@ -1696,16 +1705,19 @@ def test_splits_github_repos_by_comma(
         self, mock_session_class, mock_bq_client, mock_setup_logging
     ):
         """Test that GITHUB_REPOS is split by comma."""
-        with patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
-            },
-            clear=True,
-        ), patch("main.extract_pull_requests", return_value=iter([])) as mock_extract:
+        with (
+            patch.dict(
+                os.environ,
+                {
+                    "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev",
+                    "BIGQUERY_PROJECT": "test",
+                    "BIGQUERY_DATASET": "test",
+                    "GITHUB_TOKEN": "token",
+                },
+                clear=True,
+            ),
+            patch("main.extract_pull_requests", return_value=iter([])) as mock_extract,
+        ):
             main.main()
 
             # Should be called twice (once per repo)
@@ -1718,17 +1730,20 @@ def test_honors_github_api_url(
         self, mock_session_class, mock_bq_client, mock_setup_logging
     ):
         """Test that GITHUB_API_URL is honored."""
-        with patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
-                "GITHUB_API_URL": "https://custom-api.example.com",
-            },
-            clear=True,
-        ), patch("main.extract_pull_requests", return_value=iter([])) as mock_extract:
+        with (
+            patch.dict(
+                os.environ,
+                {
+                    "GITHUB_REPOS": "mozilla/firefox",
+                    "BIGQUERY_PROJECT": "test",
+                    "BIGQUERY_DATASET": "test",
+                    "GITHUB_TOKEN": "token",
+                    "GITHUB_API_URL": "https://custom-api.example.com",
+                },
+                clear=True,
+            ),
+            patch("main.extract_pull_requests", return_value=iter([])) as mock_extract,
+        ):
             main.main()
 
             call_kwargs = mock_extract.call_args[1]
@@ -1741,17 +1756,20 @@ def test_honors_bigquery_emulator_host(
         self, mock_session_class, mock_bq_client_class, mock_setup_logging
     ):
         """Test that BIGQUERY_EMULATOR_HOST is honored."""
-        with patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
-                "BIGQUERY_EMULATOR_HOST": "http://localhost:9050",
-            },
-            clear=True,
-        ), patch("main.extract_pull_requests", return_value=iter([])):
+        with (
+            patch.dict(
+                os.environ,
+                {
+                    "GITHUB_REPOS": "mozilla/firefox",
+                    "BIGQUERY_PROJECT": "test",
+                    "BIGQUERY_DATASET": "test",
+                    "GITHUB_TOKEN": "token",
+                    "BIGQUERY_EMULATOR_HOST": "http://localhost:9050",
+                },
+                clear=True,
+            ),
+            patch("main.extract_pull_requests", return_value=iter([])),
+        ):
             main.main()
 
             # Verify BigQuery client was created with emulator settings
@@ -1767,16 +1785,19 @@ def test_creates_session_with_headers(
         mock_session = MagicMock()
         mock_session_class.return_value = mock_session
 
-        with patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
-            },
-            clear=True,
-        ), patch("main.extract_pull_requests", return_value=iter([])):
+        with (
+            patch.dict(
+                os.environ,
+                {
+                    "GITHUB_REPOS": "mozilla/firefox",
+                    "BIGQUERY_PROJECT": "test",
+                    "BIGQUERY_DATASET": "test",
+                    "GITHUB_TOKEN": "token",
+                },
+                clear=True,
+            ),
+            patch("main.extract_pull_requests", return_value=iter([])),
+        ):
             main.main()
 
             # Verify session headers were set
@@ -1795,16 +1816,19 @@ def test_sets_authorization_header_with_token(
         mock_session = MagicMock()
         mock_session_class.return_value = mock_session
 
-        with patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "test-token-123",
-            },
-            clear=True,
-        ), patch("main.extract_pull_requests", return_value=iter([])):
+        with (
+            patch.dict(
+                os.environ,
+                {
+                    "GITHUB_REPOS": "mozilla/firefox",
+                    "BIGQUERY_PROJECT": "test",
+                    "BIGQUERY_DATASET": "test",
+                    "GITHUB_TOKEN": "test-token-123",
+                },
+                clear=True,
+            ),
+            patch("main.extract_pull_requests", return_value=iter([])),
+        ):
             main.main()
 
             # Verify Authorization header was set
@@ -1827,7 +1851,12 @@ def test_single_repo_successful_etl(
     ):
         """Test successful ETL for single repository."""
         mock_extract.return_value = iter([[{"number": 1}]])
-        mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []}
+        mock_transform.return_value = {
+            "pull_requests": [{"pull_request_id": 1}],
+            "commits": [],
+            "reviewers": [],
+            "comments": [],
+        }
 
         with patch.dict(
             os.environ,
@@ -1863,7 +1892,12 @@ def test_multiple_repos_processing(
     ):
         """Test processing multiple repositories."""
         mock_extract.return_value = iter([[{"number": 1}]])
-        mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []}
+        mock_transform.return_value = {
+            "pull_requests": [{"pull_request_id": 1}],
+            "commits": [],
+            "reviewers": [],
+            "comments": [],
+        }
 
         with patch.dict(
             os.environ,
@@ -1898,12 +1932,19 @@ def test_processes_chunks_iteratively(
     ):
         """Test that chunks are processed iteratively from generator."""
         # Return 3 chunks
-        mock_extract.return_value = iter([
-            [{"number": 1}],
-            [{"number": 2}],
-            [{"number": 3}],
-        ])
-        mock_transform.return_value = {"pull_requests": [{"pull_request_id": 1}], "commits": [], "reviewers": [], "comments": []}
+        mock_extract.return_value = iter(
+            [
+                [{"number": 1}],
+                [{"number": 2}],
+                [{"number": 3}],
+            ]
+        )
+        mock_transform.return_value = {
+            "pull_requests": [{"pull_request_id": 1}],
+            "commits": [],
+            "reviewers": [],
+            "comments": [],
+        }
 
         with patch.dict(
             os.environ,
@@ -1929,16 +1970,19 @@ def test_returns_zero_on_success(
         self, mock_session_class, mock_bq_client, mock_setup_logging
     ):
         """Test that main returns 0 on success."""
-        with patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
-            },
-            clear=True,
-        ), patch("main.extract_pull_requests", return_value=iter([])):
+        with (
+            patch.dict(
+                os.environ,
+                {
+                    "GITHUB_REPOS": "mozilla/firefox",
+                    "BIGQUERY_PROJECT": "test",
+                    "BIGQUERY_DATASET": "test",
+                    "GITHUB_TOKEN": "token",
+                },
+                clear=True,
+            ),
+            patch("main.extract_pull_requests", return_value=iter([])),
+        ):
             result = main.main()
 
         assert result == 0
@@ -2014,7 +2058,11 @@ def test_bug_id_extraction_through_pipeline(
         pr_response = Mock()
         pr_response.status_code = 200
         pr_response.json.return_value = [
-            {"number": 1, "title": "Bug 9876543 - Fix critical issue", "state": "closed"}
+            {
+                "number": 1,
+                "title": "Bug 9876543 - Fix critical issue",
+                "state": "closed",
+            }
         ]
         pr_response.links = {}
 

From 5836a842064a92c0e725cfc4f6c7e7e6a54e6245 Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Wed, 21 Jan 2026 18:53:32 -0500
Subject: [PATCH 03/11] Black formatted

---
 test_main.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/test_main.py b/test_main.py
index 400c6d3..210029c 100644
--- a/test_main.py
+++ b/test_main.py
@@ -17,7 +17,6 @@
 
 import main
 
-
 # =============================================================================
 # FIXTURES
 # =============================================================================

From 76f54f3bc2137788d41c6ea90d8bb0cb98051e71 Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Wed, 21 Jan 2026 18:55:44 -0500
Subject: [PATCH 04/11] Used isort to fix sorting order

---
 test_main.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/test_main.py b/test_main.py
index 210029c..0850eae 100644
--- a/test_main.py
+++ b/test_main.py
@@ -10,7 +10,8 @@
 import os
 import time
 from datetime import datetime
-from unittest.mock import Mock, MagicMock, patch
+from unittest.mock import MagicMock, Mock, patch
+
 import pytest
 import requests
 from google.cloud import bigquery

From 9c288cc6fe1b92bdc81f00fe52c9123a7ca3d10c Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Wed, 21 Jan 2026 20:59:03 -0500
Subject: [PATCH 05/11] Mypy test fixes

---
 test_main.py | 2 --
 1 file changed, 2 deletions(-)

diff --git a/test_main.py b/test_main.py
index 0850eae..0e60118 100644
--- a/test_main.py
+++ b/test_main.py
@@ -8,8 +8,6 @@
 
 import logging
 import os
-import time
-from datetime import datetime
 from unittest.mock import MagicMock, Mock, patch
 
 import pytest

From b95c05fbce21a49890015ff5232b94e417f07818 Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Thu, 22 Jan 2026 16:35:31 -0500
Subject: [PATCH 06/11] Copilot fixes

---
 TESTING.md | 53 +++++++++++++++++++++++++++--------------------------
 pytest.ini |  2 +-
 2 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/TESTING.md b/TESTING.md
index 104d401..c6a541c 100644
--- a/TESTING.md
+++ b/TESTING.md
@@ -19,22 +19,22 @@ unit tests, integration tests, Docker testing, linting, and CI/CD workflows.
 ## Unit Testing
 
 The test suite in `test_main.py` provides comprehensive coverage for all functions in `main.py`.
-We have **95 unit tests** covering 9 functions with 80%+ code coverage requirement.
+We have unit tests covering 9 functions with 80%+ code coverage requirement.
 
 ### Test Structure
 
 Tests are organized into 10 test classes:
 
-1. **TestSetupLogging** (1 test) - Logging configuration
-2. **TestSleepForRateLimit** (4 tests) - Rate limit handling
-3. **TestExtractPullRequests** (14 tests) - PR extraction with pagination and enrichment
-4. **TestExtractCommits** (9 tests) - Commit and file extraction
-5. **TestExtractReviewers** (6 tests) - Reviewer extraction
-6. **TestExtractComments** (7 tests) - Comment extraction (uses /issues endpoint)
-7. **TestTransformData** (26 tests) - Data transformation for all 4 BigQuery tables
-8. **TestLoadData** (8 tests) - BigQuery data loading
-9. **TestMain** (17 tests) - Main ETL orchestration
-10. **TestIntegration** (3 tests) - End-to-end integration tests (marked with `@pytest.mark.integration`)
+1. **TestSetupLogging** - Logging configuration
+2. **TestSleepForRateLimit** - Rate limit handling
+3. **TestExtractPullRequests** - PR extraction with pagination and enrichment
+4. **TestExtractCommits** - Commit and file extraction
+5. **TestExtractReviewers** - Reviewer extraction
+6. **TestExtractComments** - Comment extraction (uses /issues endpoint)
+7. **TestTransformData** - Data transformation for all 4 BigQuery tables
+8. **TestLoadData** - BigQuery data loading
+9. **TestMain** - Main ETL orchestration
+10. **TestIntegration** - End-to-end integration tests (marked with `@pytest.mark.integration`)
 
 ### Fixtures
 
@@ -51,17 +51,17 @@ Reusable fixtures are defined at the top of `test_main.py`:
 
 ### Function Coverage
 
-| Function | Tests | Coverage Target | Key Test Areas |
-|----------|-------|-----------------|----------------|
-| `setup_logging()` | 1 | 100% | Logger configuration |
-| `sleep_for_rate_limit()` | 4 | 100% | Rate limit sleep logic, edge cases |
-| `extract_pull_requests()` | 14 | 90%+ | Pagination, rate limits, enrichment, error handling |
-| `extract_commits()` | 9 | 85%+ | Commit/file fetching, rate limits, errors |
-| `extract_reviewers()` | 6 | 85%+ | Reviewer states, rate limits, errors |
-| `extract_comments()` | 7 | 85%+ | Comment fetching (via /issues), rate limits |
-| `transform_data()` | 26 | 95%+ | Bug ID extraction, 4 tables, field mapping |
-| `load_data()` | 8 | 90%+ | BigQuery insertion, snapshot dates, errors |
-| `main()` | 17 | 85%+ | Env vars, orchestration, chunking |
+| Function |  Coverage Target | Key Test Areas |
+|----------|------------------|----------------|
+| `setup_logging()` | 100% | Logger configuration |
+| `sleep_for_rate_limit()` | 100% | Rate limit sleep logic, edge cases |
+| `extract_pull_requests()` | 90%+ | Pagination, rate limits, enrichment, error handling |
+| `extract_commits()` | 85%+ | Commit/file fetching, rate limits, errors |
+| `extract_reviewers()` | 85%+ | Reviewer states, rate limits, errors |
+| `extract_comments()` | 85%+ | Comment fetching (via /issues), rate limits |
+| `transform_data()` | 95%+ | Bug ID extraction, 4 tables, field mapping |
+| `load_data()` | 90%+ | BigQuery insertion, snapshot dates, errors |
+| `main()` | 85%+ | Env vars, orchestration, chunking |
 
 **Overall Target: 85-90% coverage** (80% minimum enforced in CI)
 
@@ -318,8 +318,8 @@ docker-compose down
   - 9050 (BigQuery API)
   - 9060 (Discovery/Admin API)
 - **Configuration**: Uses `data.yml` to define the schema
-- **Project**: test-project
-- **Dataset**: test_dataset
+- **Project**: test
+- **Dataset**: github_etl
 - **Table**: pull_requests
 
 ### ETL Service
@@ -328,8 +328,9 @@ The ETL service is configured via environment variables in `docker-compose.yml`:
 
 ```yaml
 environment:
-  GITHUB_REPOS: "mozilla/firefox"
-  GITHUB_API_URL: "http://mock-github-api:5000"  # Points to mock API
+  GITHUB_REPOS: "mozilla-firefox/firefox"
+  GITHUB_TOKEN: ""  # Not needed for mock API
+  GITHUB_API_URL: "http://mock-github-api:5000"
   BIGQUERY_PROJECT: "test"
   BIGQUERY_DATASET: "github_etl"
   BIGQUERY_EMULATOR_HOST: "http://bigquery-emulator:9050"
diff --git a/pytest.ini b/pytest.ini
index 33ef84b..d553b45 100644
--- a/pytest.ini
+++ b/pytest.ini
@@ -34,7 +34,7 @@ log_cli_date_format = %Y-%m-%d %H:%M:%S
 
 # Coverage options
 [coverage:run]
-source = .
+source = main
 omit =
     test_*.py
     .venv/*

From 8b7eb487cb3209939070b036ab9528f04d05d6ae Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Fri, 23 Jan 2026 18:16:04 -0500
Subject: [PATCH 07/11] Fixed review comments

---
 Dockerfile         |    4 +-
 Dockerfile.mock    |    2 +-
 README.md          |    2 +-
 TESTING.md         |    2 +-
 pyproject.toml     |    1 +
 pytest.ini         |   47 -
 requirements.txt   |    2 +-
 test_formatting.py |   16 +
 test_main.py       | 3456 ++++++++++++++++++++++----------------------
 9 files changed, 1744 insertions(+), 1788 deletions(-)
 delete mode 100644 pytest.ini
 create mode 100644 test_formatting.py

diff --git a/Dockerfile b/Dockerfile
index 5608295..bec1ed8 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,5 +1,5 @@
 # Use the latest stable Python image
-FROM python:3.11-slim
+FROM python:3.14.2-slim
 
 # Set environment variables
 ENV PYTHONDONTWRITEBYTECODE=1 \
@@ -34,4 +34,4 @@ RUN chown -R app:app /app
 USER app
 
 # Set the default command
-CMD ["python", "main.py"]
\ No newline at end of file
+CMD ["python", "main.py"]
diff --git a/Dockerfile.mock b/Dockerfile.mock
index 1098382..cf46078 100644
--- a/Dockerfile.mock
+++ b/Dockerfile.mock
@@ -1,5 +1,5 @@
 # Dockerfile for mock GitHub API service
-FROM python:3.11-slim
+FROM python:3.14.2-slim
 
 WORKDIR /app
 
diff --git a/README.md b/README.md
index 80a3afe..570bacb 100644
--- a/README.md
+++ b/README.md
@@ -66,7 +66,7 @@ docker run --rm \
 
 ### Container Specifications
 
-- **Base Image**: `python:3.11-slim` (latest stable Python)
+- **Base Image**: `python:3.14.2-slim` (latest stable Python)
 - **User**: `app` (uid: 1000, gid: 1000)
 - **Working Directory**: `/app`
 - **Ownership**: All files in `/app` are owned by the `app` user
diff --git a/TESTING.md b/TESTING.md
index c6a541c..6901d2f 100644
--- a/TESTING.md
+++ b/TESTING.md
@@ -604,7 +604,7 @@ If coverage is below 80%:
 
 ### Tests Pass Locally But Fail in CI
 
-- Check Python version (must be 3.11)
+- Check Python version (must be 3.14)
 - Verify all dependencies are in `requirements.txt`
 - Look for environment-specific issues
 
diff --git a/pyproject.toml b/pyproject.toml
index f4aac49..ed3b2a4 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -25,6 +25,7 @@ dependencies = [
 [project.optional-dependencies]
 dev = [
     "pytest>=7.0.0",
+    "pytest-mock>=3.10.0",
     "ruff>=0.14.14",
     "black>=24.0.0",
 ]
diff --git a/pytest.ini b/pytest.ini
deleted file mode 100644
index d553b45..0000000
--- a/pytest.ini
+++ /dev/null
@@ -1,47 +0,0 @@
-[pytest]
-# Pytest configuration for GitHub ETL project
-
-# Test discovery patterns
-python_files = test_*.py
-python_classes = Test*
-python_functions = test_*
-
-# Output options
-addopts =
-    -v
-    --strict-markers
-    --tb=short
-    --cov=main
-    --cov-report=term-missing
-    --cov-report=html
-    --cov-branch
-    --cov-fail-under=80
-
-# Test paths
-testpaths = .
-
-# Markers for organizing tests
-markers =
-    unit: Unit tests for individual functions
-    integration: Integration tests that test multiple components
-    slow: Tests that take longer to run
-
-# Logging
-log_cli = false
-log_cli_level = INFO
-log_cli_format = %(asctime)s [%(levelname)8s] %(message)s
-log_cli_date_format = %Y-%m-%d %H:%M:%S
-
-# Coverage options
-[coverage:run]
-source = main
-omit =
-    test_*.py
-    .venv/*
-    venv/*
-    */site-packages/*
-
-[coverage:report]
-precision = 2
-show_missing = true
-skip_covered = false
diff --git a/requirements.txt b/requirements.txt
index fd521f6..d487f50 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,5 +1,5 @@
 #
-# This file is autogenerated by pip-compile with Python 3.14
+# This file is autogenerated by pip-compile with Python 3.10
 # by the following command:
 #
 #    pip-compile --generate-hashes pyproject.toml
diff --git a/test_formatting.py b/test_formatting.py
new file mode 100644
index 0000000..c92e534
--- /dev/null
+++ b/test_formatting.py
@@ -0,0 +1,16 @@
+"""
+Code Style Tests.
+"""
+
+import subprocess
+
+
+def test_black():
+    cmd = ("black", "--diff", "main.py")
+    output = subprocess.check_output(cmd)
+    assert not output, "The python code does not adhere to the project style."
+
+
+def test_ruff():
+    passed = subprocess.call(("ruff", "check", "main.py", "--target-version", "py314"))
+    assert not passed, "ruff did not run cleanly."
diff --git a/test_main.py b/test_main.py
index 0e60118..0d38ac3 100644
--- a/test_main.py
+++ b/test_main.py
@@ -116,1325 +116,839 @@ def mock_comment_response():
 # =============================================================================
 
 
-class TestSetupLogging:
-    """Tests for setup_logging function."""
 
-    def test_setup_logging_configures_logger(self):
-        """Test that setup_logging configures the root logger correctly."""
-        main.setup_logging()
-
-        root_logger = logging.getLogger()
-        assert root_logger.level == logging.INFO
-        assert len(root_logger.handlers) > 0
-
-        # Check that at least one handler is a StreamHandler
-        has_stream_handler = any(
-            isinstance(handler, logging.StreamHandler)
-            for handler in root_logger.handlers
-        )
-        assert has_stream_handler
-
-
-class TestSleepForRateLimit:
-    """Tests for sleep_for_rate_limit function."""
-
-    @patch("time.time")
-    @patch("time.sleep")
-    def test_sleep_for_rate_limit_when_remaining_is_zero(self, mock_sleep, mock_time):
-        """Test that sleep_for_rate_limit sleeps until reset time."""
-        mock_time.return_value = 1000
-
-        mock_response = Mock()
-        mock_response.headers = {
-            "X-RateLimit-Remaining": "0",
-            "X-RateLimit-Reset": "1120",  # 120 seconds from now
-        }
+# =============================================================================
+# TESTS FOR SETUP_LOGGING
+# =============================================================================
 
-        main.sleep_for_rate_limit(mock_response)
 
-        mock_sleep.assert_called_once_with(120)
+def test_setup_logging():
+    """Test that setup_logging configures logging correctly."""
+    main.setup_logging()
 
-    @patch("time.time")
-    @patch("time.sleep")
-    def test_sleep_for_rate_limit_when_reset_already_passed(
-        self, mock_sleep, mock_time
-    ):
-        """Test that sleep_for_rate_limit doesn't sleep negative time."""
-        mock_time.return_value = 2000
+    root_logger = logging.getLogger()
+    assert root_logger.level == logging.INFO
+    assert len(root_logger.handlers) > 0
 
-        mock_response = Mock()
-        mock_response.headers = {
-            "X-RateLimit-Remaining": "0",
-            "X-RateLimit-Reset": "1500",  # Already passed
-        }
+    # Check that at least one handler is a StreamHandler
+    has_stream_handler = any(
+        isinstance(handler, logging.StreamHandler)
+        for handler in root_logger.handlers
+    )
+    assert has_stream_handler
 
-        main.sleep_for_rate_limit(mock_response)
 
-        # Should sleep for 0 seconds (max of 0 and negative value)
-        mock_sleep.assert_called_once_with(0)
 
-    @patch("time.sleep")
-    def test_sleep_for_rate_limit_when_remaining_not_zero(self, mock_sleep):
-        """Test that sleep_for_rate_limit doesn't sleep when remaining > 0."""
-        mock_response = Mock()
-        mock_response.headers = {
-            "X-RateLimit-Remaining": "5",
-            "X-RateLimit-Reset": "1500",
-        }
+# =============================================================================
+# TESTS FOR SLEEP_FOR_RATE_LIMIT
+# =============================================================================
 
-        main.sleep_for_rate_limit(mock_response)
 
-        # Should not sleep when remaining > 0
-        mock_sleep.assert_not_called()
+@patch("time.time")
+@patch("time.sleep")
+def test_sleep_for_rate_limit_calculates_wait_time(mock_sleep, mock_time):
+    """Test that sleep_for_rate_limit calculates correct wait time."""
+    mock_time.return_value = 1000
 
-    @patch("time.sleep")
-    def test_sleep_for_rate_limit_with_missing_headers(self, mock_sleep):
-        """Test sleep_for_rate_limit with missing rate limit headers."""
-        mock_response = Mock()
-        mock_response.headers = {}
+    mock_response = Mock()
+    mock_response.headers = {
+        "X-RateLimit-Remaining": "0",
+        "X-RateLimit-Reset": "1120",  # 120 seconds from now
+    }
 
-        main.sleep_for_rate_limit(mock_response)
+    main.sleep_for_rate_limit(mock_response)
 
-        # Should not sleep when headers are missing (defaults to remaining=1)
-        mock_sleep.assert_not_called()
+    mock_sleep.assert_called_once_with(120)
 
 
-class TestExtractPullRequests:
-    """Tests for extract_pull_requests function."""
+@patch("time.time")
+@patch("time.sleep")
+def test_sleep_for_rate_limit_when_reset_already_passed(mock_sleep, mock_time):
+    """Test that sleep_for_rate_limit doesn't sleep negative time."""
+    mock_time.return_value = 2000
 
-    def test_extract_single_page(self, mock_session):
-        """Test extracting data from a single page of results."""
-        mock_response = Mock()
-        mock_response.status_code = 200
-        mock_response.json.return_value = [
-            {"number": 1, "title": "PR 1"},
-            {"number": 2, "title": "PR 2"},
-        ]
-        mock_response.links = {}
-
-        mock_session.get.return_value = mock_response
-
-        # Mock the extract functions
-        with (
-            patch("main.extract_commits", return_value=[]),
-            patch("main.extract_reviewers", return_value=[]),
-            patch("main.extract_comments", return_value=[]),
-        ):
-            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-        assert len(result) == 1
-        assert len(result[0]) == 2
-        assert result[0][0]["number"] == 1
-        assert result[0][1]["number"] == 2
-
-    def test_extract_multiple_pages(self, mock_session):
-        """Test extracting data across multiple pages with pagination."""
-        # First page response
-        mock_response_1 = Mock()
-        mock_response_1.status_code = 200
-        mock_response_1.json.return_value = [
-            {"number": 1, "title": "PR 1"},
-            {"number": 2, "title": "PR 2"},
-        ]
-        mock_response_1.links = {
-            "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
-        }
+    mock_response = Mock()
+    mock_response.headers = {
+        "X-RateLimit-Remaining": "0",
+        "X-RateLimit-Reset": "1500",  # Already passed
+    }
 
-        # Second page response
-        mock_response_2 = Mock()
-        mock_response_2.status_code = 200
-        mock_response_2.json.return_value = [{"number": 3, "title": "PR 3"}]
-        mock_response_2.links = {}
-
-        mock_session.get.side_effect = [mock_response_1, mock_response_2]
-
-        with (
-            patch("main.extract_commits", return_value=[]),
-            patch("main.extract_reviewers", return_value=[]),
-            patch("main.extract_comments", return_value=[]),
-        ):
-            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-        assert len(result) == 2
-        assert len(result[0]) == 2
-        assert len(result[1]) == 1
-        assert result[0][0]["number"] == 1
-        assert result[1][0]["number"] == 3
-
-    def test_enriches_prs_with_commit_data(self, mock_session):
-        """Test that PRs are enriched with commit data."""
-        mock_response = Mock()
-        mock_response.status_code = 200
-        mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
-        mock_response.links = {}
-
-        mock_session.get.return_value = mock_response
-
-        mock_commits = [{"sha": "abc123"}]
-
-        with (
-            patch(
-                "main.extract_commits", return_value=mock_commits
-            ) as mock_extract_commits,
-            patch("main.extract_reviewers", return_value=[]),
-            patch("main.extract_comments", return_value=[]),
-        ):
-            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-        assert result[0][0]["commit_data"] == mock_commits
-        mock_extract_commits.assert_called_once()
-
-    def test_enriches_prs_with_reviewer_data(self, mock_session):
-        """Test that PRs are enriched with reviewer data."""
-        mock_response = Mock()
-        mock_response.status_code = 200
-        mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
-        mock_response.links = {}
-
-        mock_session.get.return_value = mock_response
-
-        mock_reviewers = [{"id": 789, "state": "APPROVED"}]
-
-        with (
-            patch("main.extract_commits", return_value=[]),
-            patch(
-                "main.extract_reviewers", return_value=mock_reviewers
-            ) as mock_extract_reviewers,
-            patch("main.extract_comments", return_value=[]),
-        ):
-            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-        assert result[0][0]["reviewer_data"] == mock_reviewers
-        mock_extract_reviewers.assert_called_once()
-
-    def test_enriches_prs_with_comment_data(self, mock_session):
-        """Test that PRs are enriched with comment data."""
-        mock_response = Mock()
-        mock_response.status_code = 200
-        mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
-        mock_response.links = {}
-
-        mock_session.get.return_value = mock_response
-
-        mock_comments = [{"id": 456, "body": "Great work!"}]
-
-        with (
-            patch("main.extract_commits", return_value=[]),
-            patch("main.extract_reviewers", return_value=[]),
-            patch(
-                "main.extract_comments", return_value=mock_comments
-            ) as mock_extract_comments,
-        ):
-            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-        assert result[0][0]["comment_data"] == mock_comments
-        mock_extract_comments.assert_called_once()
-
-    @patch("main.sleep_for_rate_limit")
-    def test_handles_rate_limit(self, mock_sleep, mock_session):
-        """Test that extract_pull_requests handles rate limiting correctly."""
-        # Rate limit response
-        mock_response_rate_limit = Mock()
-        mock_response_rate_limit.status_code = 403
-        mock_response_rate_limit.headers = {"X-RateLimit-Remaining": "0"}
-
-        # Successful response after rate limit
-        mock_response_success = Mock()
-        mock_response_success.status_code = 200
-        mock_response_success.json.return_value = [{"number": 1, "title": "PR 1"}]
-        mock_response_success.links = {}
-
-        mock_session.get.side_effect = [
-            mock_response_rate_limit,
-            mock_response_success,
-        ]
+    main.sleep_for_rate_limit(mock_response)
 
-        with (
-            patch("main.extract_commits", return_value=[]),
-            patch("main.extract_reviewers", return_value=[]),
-            patch("main.extract_comments", return_value=[]),
-        ):
-            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+    # Should sleep for 0 seconds (max of 0 and negative value)
+    mock_sleep.assert_called_once_with(0)
 
-        mock_sleep.assert_called_once_with(mock_response_rate_limit)
-        assert len(result) == 1
 
-    def test_handles_api_error_404(self, mock_session):
-        """Test that extract_pull_requests raises SystemExit on 404."""
-        mock_response = Mock()
-        mock_response.status_code = 404
-        mock_response.text = "Not Found"
+@patch("time.sleep")
+def test_sleep_for_rate_limit_when_remaining_not_zero(mock_sleep):
+    """Test that sleep_for_rate_limit doesn't sleep when remaining > 0."""
+    mock_response = Mock()
+    mock_response.headers = {
+        "X-RateLimit-Remaining": "5",
+        "X-RateLimit-Reset": "1500",
+    }
 
-        mock_session.get.return_value = mock_response
+    main.sleep_for_rate_limit(mock_response)
 
-        with pytest.raises(SystemExit) as exc_info:
-            list(main.extract_pull_requests(mock_session, "mozilla/nonexistent"))
+    # Should not sleep when remaining > 0
+    mock_sleep.assert_not_called()
 
-        assert "GitHub API error 404" in str(exc_info.value)
 
-    def test_handles_api_error_500(self, mock_session):
-        """Test that extract_pull_requests raises SystemExit on 500."""
-        mock_response = Mock()
-        mock_response.status_code = 500
-        mock_response.text = "Internal Server Error"
+@patch("time.sleep")
+def test_sleep_for_rate_limit_with_missing_headers(mock_sleep):
+    """Test sleep_for_rate_limit with missing rate limit headers."""
+    mock_response = Mock()
+    mock_response.headers = {}
 
-        mock_session.get.return_value = mock_response
+    main.sleep_for_rate_limit(mock_response)
 
-        with pytest.raises(SystemExit) as exc_info:
-            list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-        assert "GitHub API error 500" in str(exc_info.value)
-
-    def test_stops_on_empty_batch(self, mock_session):
-        """Test that extraction stops when an empty batch is returned."""
-        # First page with data
-        mock_response_1 = Mock()
-        mock_response_1.status_code = 200
-        mock_response_1.json.return_value = [{"number": 1}]
-        mock_response_1.links = {
-            "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
-        }
+    # Should not sleep when headers are missing (defaults to remaining=1)
+    mock_sleep.assert_not_called()
 
-        # Second page empty
-        mock_response_2 = Mock()
-        mock_response_2.status_code = 200
-        mock_response_2.json.return_value = []
-        mock_response_2.links = {}
-
-        mock_session.get.side_effect = [mock_response_1, mock_response_2]
-
-        with (
-            patch("main.extract_commits", return_value=[]),
-            patch("main.extract_reviewers", return_value=[]),
-            patch("main.extract_comments", return_value=[]),
-        ):
-            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-        # Should only have 1 chunk from first page
-        assert len(result) == 1
-        assert len(result[0]) == 1
-
-    def test_invalid_page_number_handling(self, mock_session):
-        """Test handling of invalid page number in pagination."""
-        mock_response_1 = Mock()
-        mock_response_1.status_code = 200
-        mock_response_1.json.return_value = [{"number": 1}]
-        mock_response_1.links = {
-            "next": {
-                "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=invalid"
-            }
-        }
 
-        mock_session.get.return_value = mock_response_1
-
-        with (
-            patch("main.extract_commits", return_value=[]),
-            patch("main.extract_reviewers", return_value=[]),
-            patch("main.extract_comments", return_value=[]),
-        ):
-            result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-        # Should stop pagination on invalid page number
-        assert len(result) == 1
-
-    def test_custom_github_api_url(self, mock_session):
-        """Test using custom GitHub API URL."""
-        custom_url = "https://mock-github.example.com"
-
-        mock_response = Mock()
-        mock_response.status_code = 200
-        mock_response.json.return_value = [{"number": 1}]
-        mock_response.links = {}
-
-        mock_session.get.return_value = mock_response
-
-        with (
-            patch("main.extract_commits", return_value=[]),
-            patch("main.extract_reviewers", return_value=[]),
-            patch("main.extract_comments", return_value=[]),
-        ):
-            list(
-                main.extract_pull_requests(
-                    mock_session, "mozilla/firefox", github_api_url=custom_url
-                )
-            )
 
-        # Verify custom URL was used
-        call_args = mock_session.get.call_args
-        assert custom_url in call_args[0][0]
-
-    def test_skips_prs_without_number_field(self, mock_session):
-        """Test that PRs without 'number' field are skipped."""
-        mock_response = Mock()
-        mock_response.status_code = 200
-        mock_response.json.return_value = [
-            {"number": 1, "title": "PR 1"},
-            {"title": "PR without number"},  # Missing number field
-            {"number": 2, "title": "PR 2"},
-        ]
-        mock_response.links = {}
+# =============================================================================
+# TESTS FOR EXTRACT_PULL_REQUESTS
+# =============================================================================
 
-        mock_session.get.return_value = mock_response
 
-        with (
-            patch("main.extract_commits", return_value=[]) as mock_commits,
-            patch("main.extract_reviewers", return_value=[]),
-            patch("main.extract_comments", return_value=[]),
-        ):
-            list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+def test_extract_pull_requests_basic(mock_session):
+    """Test basic extraction of pull requests."""
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [
+        {"number": 1, "title": "PR 1"},
+        {"number": 2, "title": "PR 2"},
+    ]
+    mock_response.links = {}
 
-        # extract_commits should only be called for PRs with number field
-        assert mock_commits.call_count == 2
+    mock_session.get.return_value = mock_response
 
+    # Mock the extract functions
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    assert len(result) == 1
+    assert len(result[0]) == 2
+    assert result[0][0]["number"] == 1
+    assert result[0][1]["number"] == 2
+
+def test_extract_multiple_pages(mock_session):
+    """Test extracting data across multiple pages with pagination."""
+    # First page response
+    mock_response_1 = Mock()
+    mock_response_1.status_code = 200
+    mock_response_1.json.return_value = [
+        {"number": 1, "title": "PR 1"},
+        {"number": 2, "title": "PR 2"},
+    ]
+    mock_response_1.links = {
+        "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
+    }
 
-class TestExtractCommits:
-    """Tests for extract_commits function."""
+    # Second page response
+    mock_response_2 = Mock()
+    mock_response_2.status_code = 200
+    mock_response_2.json.return_value = [{"number": 3, "title": "PR 3"}]
+    mock_response_2.links = {}
 
-    def test_fetch_commits_with_files(self, mock_session):
-        """Test fetching commits with files for a PR."""
-        # Mock commits list response
-        commits_response = Mock()
-        commits_response.status_code = 200
-        commits_response.json.return_value = [
-            {"sha": "abc123"},
-            {"sha": "def456"},
-        ]
+    mock_session.get.side_effect = [mock_response_1, mock_response_2]
 
-        # Mock individual commit responses
-        commit_detail_1 = Mock()
-        commit_detail_1.status_code = 200
-        commit_detail_1.json.return_value = {
-            "sha": "abc123",
-            "files": [{"filename": "file1.py", "additions": 10}],
-        }
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    assert len(result) == 2
+    assert len(result[0]) == 2
+    assert len(result[1]) == 1
+    assert result[0][0]["number"] == 1
+    assert result[1][0]["number"] == 3
+
+def test_enriches_prs_with_commit_data(mock_session):
+    """Test that PRs are enriched with commit data."""
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
+    mock_response.links = {}
+
+    mock_session.get.return_value = mock_response
+
+    mock_commits = [{"sha": "abc123"}]
+
+    with (
+        patch(
+            "main.extract_commits", return_value=mock_commits
+        ) as mock_extract_commits,
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
-        commit_detail_2 = Mock()
-        commit_detail_2.status_code = 200
-        commit_detail_2.json.return_value = {
-            "sha": "def456",
-            "files": [{"filename": "file2.py", "deletions": 5}],
-        }
+    assert result[0][0]["commit_data"] == mock_commits
+    mock_extract_commits.assert_called_once()
 
-        mock_session.get.side_effect = [
-            commits_response,
-            commit_detail_1,
-            commit_detail_2,
-        ]
+def test_enriches_prs_with_reviewer_data(mock_session):
+    """Test that PRs are enriched with reviewer data."""
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
+    mock_response.links = {}
 
-        result = main.extract_commits(mock_session, "mozilla/firefox", 123)
-
-        assert len(result) == 2
-        assert result[0]["sha"] == "abc123"
-        assert result[0]["files"][0]["filename"] == "file1.py"
-        assert result[1]["sha"] == "def456"
-        assert result[1]["files"][0]["filename"] == "file2.py"
-
-    def test_multiple_files_per_commit(self, mock_session):
-        """Test handling multiple files in a single commit."""
-        commits_response = Mock()
-        commits_response.status_code = 200
-        commits_response.json.return_value = [{"sha": "abc123"}]
-
-        commit_detail = Mock()
-        commit_detail.status_code = 200
-        commit_detail.json.return_value = {
-            "sha": "abc123",
-            "files": [
-                {"filename": "file1.py", "additions": 10},
-                {"filename": "file2.py", "additions": 20},
-                {"filename": "file3.py", "deletions": 5},
-            ],
-        }
+    mock_session.get.return_value = mock_response
 
-        mock_session.get.side_effect = [commits_response, commit_detail]
+    mock_reviewers = [{"id": 789, "state": "APPROVED"}]
 
-        result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch(
+            "main.extract_reviewers", return_value=mock_reviewers
+        ) as mock_extract_reviewers,
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
-        assert len(result) == 1
-        assert len(result[0]["files"]) == 3
+    assert result[0][0]["reviewer_data"] == mock_reviewers
+    mock_extract_reviewers.assert_called_once()
 
-    @patch("main.sleep_for_rate_limit")
-    def test_rate_limit_on_commits_list(self, mock_sleep, mock_session):
-        """Test rate limit handling when fetching commits list."""
-        # Rate limit response
-        rate_limit_response = Mock()
-        rate_limit_response.status_code = 403
-        rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
+def test_enriches_prs_with_comment_data(mock_session):
+    """Test that PRs are enriched with comment data."""
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
+    mock_response.links = {}
 
-        # Success response
-        success_response = Mock()
-        success_response.status_code = 200
-        success_response.json.return_value = []
+    mock_session.get.return_value = mock_response
 
-        mock_session.get.side_effect = [rate_limit_response, success_response]
+    mock_comments = [{"id": 456, "body": "Great work!"}]
 
-        result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch(
+            "main.extract_comments", return_value=mock_comments
+        ) as mock_extract_comments,
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    assert result[0][0]["comment_data"] == mock_comments
+    mock_extract_comments.assert_called_once()
+
+@patch("main.sleep_for_rate_limit")
+def test_handles_rate_limit(mock_sleep, mock_session):
+    """Test that extract_pull_requests handles rate limiting correctly."""
+    # Rate limit response
+    mock_response_rate_limit = Mock()
+    mock_response_rate_limit.status_code = 403
+    mock_response_rate_limit.headers = {"X-RateLimit-Remaining": "0"}
+
+    # Successful response after rate limit
+    mock_response_success = Mock()
+    mock_response_success.status_code = 200
+    mock_response_success.json.return_value = [{"number": 1, "title": "PR 1"}]
+    mock_response_success.links = {}
+
+    mock_session.get.side_effect = [
+        mock_response_rate_limit,
+        mock_response_success,
+    ]
+
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
-        mock_sleep.assert_called_once()
-        assert result == []
+    mock_sleep.assert_called_once_with(mock_response_rate_limit)
+    assert len(result) == 1
 
-    def test_api_error_on_commits_list(self, mock_session):
-        """Test API error handling when fetching commits list."""
-        error_response = Mock()
-        error_response.status_code = 500
-        error_response.text = "Internal Server Error"
+def test_handles_api_error_404(mock_session):
+    """Test that extract_pull_requests raises SystemExit on 404."""
+    mock_response = Mock()
+    mock_response.status_code = 404
+    mock_response.text = "Not Found"
 
-        mock_session.get.return_value = error_response
+    mock_session.get.return_value = mock_response
 
-        with pytest.raises(SystemExit) as exc_info:
-            main.extract_commits(mock_session, "mozilla/firefox", 123)
+    with pytest.raises(SystemExit) as exc_info:
+        list(main.extract_pull_requests(mock_session, "mozilla/nonexistent"))
 
-        assert "GitHub API error 500" in str(exc_info.value)
+    assert "GitHub API error 404" in str(exc_info.value)
 
-    def test_api_error_on_individual_commit(self, mock_session):
-        """Test API error when fetching individual commit details."""
-        commits_response = Mock()
-        commits_response.status_code = 200
-        commits_response.json.return_value = [{"sha": "abc123"}]
+def test_handles_api_error_500(mock_session):
+    """Test that extract_pull_requests raises SystemExit on 500."""
+    mock_response = Mock()
+    mock_response.status_code = 500
+    mock_response.text = "Internal Server Error"
 
-        commit_error = Mock()
-        commit_error.status_code = 404
-        commit_error.text = "Commit not found"
+    mock_session.get.return_value = mock_response
 
-        mock_session.get.side_effect = [commits_response, commit_error]
+    with pytest.raises(SystemExit) as exc_info:
+        list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
-        with pytest.raises(SystemExit) as exc_info:
-            main.extract_commits(mock_session, "mozilla/firefox", 123)
+    assert "GitHub API error 500" in str(exc_info.value)
 
-        assert "GitHub API error 404" in str(exc_info.value)
+def test_stops_on_empty_batch(mock_session):
+    """Test that extraction stops when an empty batch is returned."""
+    # First page with data
+    mock_response_1 = Mock()
+    mock_response_1.status_code = 200
+    mock_response_1.json.return_value = [{"number": 1}]
+    mock_response_1.links = {
+        "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
+    }
 
-    def test_commit_without_sha_field(self, mock_session):
-        """Test handling commits without sha field."""
-        commits_response = Mock()
-        commits_response.status_code = 200
-        commits_response.json.return_value = [
-            {"sha": "abc123"},
-            {},  # Missing sha field
-        ]
+    # Second page empty
+    mock_response_2 = Mock()
+    mock_response_2.status_code = 200
+    mock_response_2.json.return_value = []
+    mock_response_2.links = {}
 
-        commit_detail_1 = Mock()
-        commit_detail_1.status_code = 200
-        commit_detail_1.json.return_value = {"sha": "abc123", "files": []}
+    mock_session.get.side_effect = [mock_response_1, mock_response_2]
 
-        commit_detail_2 = Mock()
-        commit_detail_2.status_code = 200
-        commit_detail_2.json.return_value = {"files": []}
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    # Should only have 1 chunk from first page
+    assert len(result) == 1
+    assert len(result[0]) == 1
+
+def test_invalid_page_number_handling(mock_session):
+    """Test handling of invalid page number in pagination."""
+    mock_response_1 = Mock()
+    mock_response_1.status_code = 200
+    mock_response_1.json.return_value = [{"number": 1}]
+    mock_response_1.links = {
+        "next": {
+            "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=invalid"
+        }
+    }
 
-        mock_session.get.side_effect = [
-            commits_response,
-            commit_detail_1,
-            commit_detail_2,
-        ]
+    mock_session.get.return_value = mock_response_1
 
-        result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
-        # Should handle the commit without sha gracefully
-        assert len(result) == 2
+    # Should stop pagination on invalid page number
+    assert len(result) == 1
 
-    def test_custom_github_api_url(self, mock_session):
-        """Test using custom GitHub API URL for commits."""
-        custom_url = "https://mock-github.example.com"
+def test_custom_github_api_url(mock_session):
+    """Test using custom GitHub API URL."""
+    custom_url = "https://mock-github.example.com"
 
-        commits_response = Mock()
-        commits_response.status_code = 200
-        commits_response.json.return_value = []
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [{"number": 1}]
+    mock_response.links = {}
 
-        mock_session.get.return_value = commits_response
+    mock_session.get.return_value = mock_response
 
-        main.extract_commits(
-            mock_session, "mozilla/firefox", 123, github_api_url=custom_url
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        list(
+            main.extract_pull_requests(
+                mock_session, "mozilla/firefox", github_api_url=custom_url
+            )
         )
 
-        call_args = mock_session.get.call_args
-        assert custom_url in call_args[0][0]
-
-    def test_empty_commits_list(self, mock_session):
-        """Test handling PR with no commits."""
-        commits_response = Mock()
-        commits_response.status_code = 200
-        commits_response.json.return_value = []
-
-        mock_session.get.return_value = commits_response
-
-        result = main.extract_commits(mock_session, "mozilla/firefox", 123)
-
-        assert result == []
-
-
-class TestExtractReviewers:
-    """Tests for extract_reviewers function."""
+    # Verify custom URL was used
+    call_args = mock_session.get.call_args
+    assert custom_url in call_args[0][0]
+
+def test_skips_prs_without_number_field(mock_session):
+    """Test that PRs without 'number' field are skipped."""
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [
+        {"number": 1, "title": "PR 1"},
+        {"title": "PR without number"},  # Missing number field
+        {"number": 2, "title": "PR 2"},
+    ]
+    mock_response.links = {}
+
+    mock_session.get.return_value = mock_response
+
+    with (
+        patch("main.extract_commits", return_value=[]) as mock_commits,
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
 
-    def test_fetch_reviewers(self, mock_session):
-        """Test fetching reviewers for a PR."""
-        reviewers_response = Mock()
-        reviewers_response.status_code = 200
-        reviewers_response.json.return_value = [
-            {
-                "id": 789,
-                "user": {"login": "reviewer1"},
-                "state": "APPROVED",
-                "submitted_at": "2024-01-01T15:00:00Z",
-            },
-            {
-                "id": 790,
-                "user": {"login": "reviewer2"},
-                "state": "CHANGES_REQUESTED",
-                "submitted_at": "2024-01-01T16:00:00Z",
-            },
-        ]
+    # extract_commits should only be called for PRs with number field
+    assert mock_commits.call_count == 2
 
-        mock_session.get.return_value = reviewers_response
 
-        result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
 
-        assert len(result) == 2
-        assert result[0]["state"] == "APPROVED"
-        assert result[1]["state"] == "CHANGES_REQUESTED"
+# =============================================================================
+# TESTS FOR EXTRACT_COMMITS
+# =============================================================================
 
-    def test_multiple_review_states(self, mock_session):
-        """Test handling multiple different review states."""
-        reviewers_response = Mock()
-        reviewers_response.status_code = 200
-        reviewers_response.json.return_value = [
-            {"id": 1, "state": "APPROVED", "user": {"login": "user1"}},
-            {"id": 2, "state": "CHANGES_REQUESTED", "user": {"login": "user2"}},
-            {"id": 3, "state": "COMMENTED", "user": {"login": "user3"}},
-            {"id": 4, "state": "DISMISSED", "user": {"login": "user4"}},
-        ]
+    # Mock commits list response
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = [
+        {"sha": "abc123"},
+        {"sha": "def456"},
+    ]
+
+    # Mock individual commit responses
+    commit_detail_1 = Mock()
+    commit_detail_1.status_code = 200
+    commit_detail_1.json.return_value = {
+        "sha": "abc123",
+        "files": [{"filename": "file1.py", "additions": 10}],
+    }
 
-        mock_session.get.return_value = reviewers_response
+    commit_detail_2 = Mock()
+    commit_detail_2.status_code = 200
+    commit_detail_2.json.return_value = {
+        "sha": "def456",
+        "files": [{"filename": "file2.py", "deletions": 5}],
+    }
 
-        result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+    mock_session.get.side_effect = [
+        commits_response,
+        commit_detail_1,
+        commit_detail_2,
+    ]
+
+    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+    assert len(result) == 2
+    assert result[0]["sha"] == "abc123"
+    assert result[0]["files"][0]["filename"] == "file1.py"
+    assert result[1]["sha"] == "def456"
+    assert result[1]["files"][0]["filename"] == "file2.py"
+
+def test_multiple_files_per_commit(mock_session):
+    """Test handling multiple files in a single commit."""
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = [{"sha": "abc123"}]
+
+    commit_detail = Mock()
+    commit_detail.status_code = 200
+    commit_detail.json.return_value = {
+        "sha": "abc123",
+        "files": [
+            {"filename": "file1.py", "additions": 10},
+            {"filename": "file2.py", "additions": 20},
+            {"filename": "file3.py", "deletions": 5},
+        ],
+    }
 
-        assert len(result) == 4
-        states = [r["state"] for r in result]
-        assert "APPROVED" in states
-        assert "CHANGES_REQUESTED" in states
-        assert "COMMENTED" in states
+    mock_session.get.side_effect = [commits_response, commit_detail]
 
-    def test_empty_reviewers_list(self, mock_session):
-        """Test handling PR with no reviewers."""
-        reviewers_response = Mock()
-        reviewers_response.status_code = 200
-        reviewers_response.json.return_value = []
+    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
 
-        mock_session.get.return_value = reviewers_response
+    assert len(result) == 1
+    assert len(result[0]["files"]) == 3
 
-        result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+@patch("main.sleep_for_rate_limit")
+def test_rate_limit_on_commits_list(mock_sleep, mock_session):
+    """Test rate limit handling when fetching commits list."""
+    # Rate limit response
+    rate_limit_response = Mock()
+    rate_limit_response.status_code = 403
+    rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
 
-        assert result == []
+    # Success response
+    success_response = Mock()
+    success_response.status_code = 200
+    success_response.json.return_value = []
 
-    @patch("main.sleep_for_rate_limit")
-    def test_rate_limit_handling(self, mock_sleep, mock_session):
-        """Test rate limit handling when fetching reviewers."""
-        rate_limit_response = Mock()
-        rate_limit_response.status_code = 403
-        rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
+    mock_session.get.side_effect = [rate_limit_response, success_response]
 
-        success_response = Mock()
-        success_response.status_code = 200
-        success_response.json.return_value = []
+    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
 
-        mock_session.get.side_effect = [rate_limit_response, success_response]
+    mock_sleep.assert_called_once()
+    assert result == []
 
-        result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+def test_api_error_on_commits_list(mock_session):
+    """Test API error handling when fetching commits list."""
+    error_response = Mock()
+    error_response.status_code = 500
+    error_response.text = "Internal Server Error"
 
-        mock_sleep.assert_called_once()
-        assert result == []
+    mock_session.get.return_value = error_response
 
-    def test_api_error(self, mock_session):
-        """Test API error handling when fetching reviewers."""
-        error_response = Mock()
-        error_response.status_code = 500
-        error_response.text = "Internal Server Error"
+    with pytest.raises(SystemExit) as exc_info:
+        main.extract_commits(mock_session, "mozilla/firefox", 123)
 
-        mock_session.get.return_value = error_response
+    assert "GitHub API error 500" in str(exc_info.value)
 
-        with pytest.raises(SystemExit) as exc_info:
-            main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+def test_api_error_on_individual_commit(mock_session):
+    """Test API error when fetching individual commit details."""
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = [{"sha": "abc123"}]
 
-        assert "GitHub API error 500" in str(exc_info.value)
+    commit_error = Mock()
+    commit_error.status_code = 404
+    commit_error.text = "Commit not found"
 
-    def test_custom_github_api_url(self, mock_session):
-        """Test using custom GitHub API URL for reviewers."""
-        custom_url = "https://mock-github.example.com"
+    mock_session.get.side_effect = [commits_response, commit_error]
 
-        reviewers_response = Mock()
-        reviewers_response.status_code = 200
-        reviewers_response.json.return_value = []
+    with pytest.raises(SystemExit) as exc_info:
+        main.extract_commits(mock_session, "mozilla/firefox", 123)
 
-        mock_session.get.return_value = reviewers_response
+    assert "GitHub API error 404" in str(exc_info.value)
 
-        main.extract_reviewers(
-            mock_session, "mozilla/firefox", 123, github_api_url=custom_url
-        )
+def test_commit_without_sha_field(mock_session):
+    """Test handling commits without sha field."""
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = [
+        {"sha": "abc123"},
+        {},  # Missing sha field
+    ]
 
-        call_args = mock_session.get.call_args
-        assert custom_url in call_args[0][0]
+    commit_detail_1 = Mock()
+    commit_detail_1.status_code = 200
+    commit_detail_1.json.return_value = {"sha": "abc123", "files": []}
 
+    commit_detail_2 = Mock()
+    commit_detail_2.status_code = 200
+    commit_detail_2.json.return_value = {"files": []}
 
-class TestExtractComments:
-    """Tests for extract_comments function."""
+    mock_session.get.side_effect = [
+        commits_response,
+        commit_detail_1,
+        commit_detail_2,
+    ]
 
-    def test_fetch_comments(self, mock_session):
-        """Test fetching comments for a PR."""
-        comments_response = Mock()
-        comments_response.status_code = 200
-        comments_response.json.return_value = [
-            {
-                "id": 456,
-                "user": {"login": "commenter1"},
-                "body": "This looks good",
-                "created_at": "2024-01-01T14:00:00Z",
-            },
-            {
-                "id": 457,
-                "user": {"login": "commenter2"},
-                "body": "I have concerns",
-                "created_at": "2024-01-01T15:00:00Z",
-            },
-        ]
+    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
 
-        mock_session.get.return_value = comments_response
+    # Should handle the commit without sha gracefully
+    assert len(result) == 2
 
-        result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+def test_custom_github_api_url(mock_session):
+    """Test using custom GitHub API URL for commits."""
+    custom_url = "https://mock-github.example.com"
 
-        assert len(result) == 2
-        assert result[0]["id"] == 456
-        assert result[1]["id"] == 457
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = []
 
-    def test_uses_issues_endpoint(self, mock_session):
-        """Test that comments use /issues endpoint not /pulls."""
-        comments_response = Mock()
-        comments_response.status_code = 200
-        comments_response.json.return_value = []
+    mock_session.get.return_value = commits_response
 
-        mock_session.get.return_value = comments_response
+    main.extract_commits(
+        mock_session, "mozilla/firefox", 123, github_api_url=custom_url
+    )
 
-        main.extract_comments(mock_session, "mozilla/firefox", 123)
+    call_args = mock_session.get.call_args
+    assert custom_url in call_args[0][0]
 
-        call_args = mock_session.get.call_args
-        url = call_args[0][0]
-        assert "/issues/123/comments" in url
-        assert "/pulls/123/comments" not in url
-
-    def test_multiple_comments(self, mock_session):
-        """Test handling multiple comments."""
-        comments_response = Mock()
-        comments_response.status_code = 200
-        comments_response.json.return_value = [
-            {"id": i, "user": {"login": f"user{i}"}, "body": f"Comment {i}"}
-            for i in range(1, 11)
-        ]
+def test_empty_commits_list(mock_session):
+    """Test handling PR with no commits."""
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = []
 
-        mock_session.get.return_value = comments_response
+    mock_session.get.return_value = commits_response
 
-        result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
 
-        assert len(result) == 10
+    assert result == []
 
-    def test_empty_comments_list(self, mock_session):
-        """Test handling PR with no comments."""
-        comments_response = Mock()
-        comments_response.status_code = 200
-        comments_response.json.return_value = []
 
-        mock_session.get.return_value = comments_response
 
-        result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+# =============================================================================
+# TESTS FOR EXTRACT_REVIEWERS
+# =============================================================================
 
-        assert result == []
+    reviewers_response = Mock()
+    reviewers_response.status_code = 200
+    reviewers_response.json.return_value = [
+        {
+            "id": 789,
+            "user": {"login": "reviewer1"},
+            "state": "APPROVED",
+            "submitted_at": "2024-01-01T15:00:00Z",
+        },
+        {
+            "id": 790,
+            "user": {"login": "reviewer2"},
+            "state": "CHANGES_REQUESTED",
+            "submitted_at": "2024-01-01T16:00:00Z",
+        },
+    ]
 
-    @patch("main.sleep_for_rate_limit")
-    def test_rate_limit_handling(self, mock_sleep, mock_session):
-        """Test rate limit handling when fetching comments."""
-        rate_limit_response = Mock()
-        rate_limit_response.status_code = 403
-        rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
+    mock_session.get.return_value = reviewers_response
 
-        success_response = Mock()
-        success_response.status_code = 200
-        success_response.json.return_value = []
+    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
 
-        mock_session.get.side_effect = [rate_limit_response, success_response]
+    assert len(result) == 2
+    assert result[0]["state"] == "APPROVED"
+    assert result[1]["state"] == "CHANGES_REQUESTED"
 
-        result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+def test_multiple_review_states(mock_session):
+    """Test handling multiple different review states."""
+    reviewers_response = Mock()
+    reviewers_response.status_code = 200
+    reviewers_response.json.return_value = [
+        {"id": 1, "state": "APPROVED", "user": {"login": "user1"}},
+        {"id": 2, "state": "CHANGES_REQUESTED", "user": {"login": "user2"}},
+        {"id": 3, "state": "COMMENTED", "user": {"login": "user3"}},
+        {"id": 4, "state": "DISMISSED", "user": {"login": "user4"}},
+    ]
 
-        mock_sleep.assert_called_once()
-        assert result == []
+    mock_session.get.return_value = reviewers_response
 
-    def test_api_error(self, mock_session):
-        """Test API error handling when fetching comments."""
-        error_response = Mock()
-        error_response.status_code = 404
-        error_response.text = "Not Found"
+    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
 
-        mock_session.get.return_value = error_response
+    assert len(result) == 4
+    states = [r["state"] for r in result]
+    assert "APPROVED" in states
+    assert "CHANGES_REQUESTED" in states
+    assert "COMMENTED" in states
 
-        with pytest.raises(SystemExit) as exc_info:
-            main.extract_comments(mock_session, "mozilla/firefox", 123)
+def test_empty_reviewers_list(mock_session):
+    """Test handling PR with no reviewers."""
+    reviewers_response = Mock()
+    reviewers_response.status_code = 200
+    reviewers_response.json.return_value = []
 
-        assert "GitHub API error 404" in str(exc_info.value)
+    mock_session.get.return_value = reviewers_response
 
-    def test_custom_github_api_url(self, mock_session):
-        """Test using custom GitHub API URL for comments."""
-        custom_url = "https://mock-github.example.com"
+    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
 
-        comments_response = Mock()
-        comments_response.status_code = 200
-        comments_response.json.return_value = []
+    assert result == []
 
-        mock_session.get.return_value = comments_response
+@patch("main.sleep_for_rate_limit")
+def test_rate_limit_handling(mock_sleep, mock_session):
+    """Test rate limit handling when fetching reviewers."""
+    rate_limit_response = Mock()
+    rate_limit_response.status_code = 403
+    rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
 
-        main.extract_comments(
-            mock_session, "mozilla/firefox", 123, github_api_url=custom_url
-        )
+    success_response = Mock()
+    success_response.status_code = 200
+    success_response.json.return_value = []
 
-        call_args = mock_session.get.call_args
-        assert custom_url in call_args[0][0]
+    mock_session.get.side_effect = [rate_limit_response, success_response]
 
+    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
 
-class TestTransformData:
-    """Tests for transform_data function."""
+    mock_sleep.assert_called_once()
+    assert result == []
 
-    def test_basic_pr_transformation(self):
-        """Test basic pull request field mapping."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "Fix login bug",
-                "state": "closed",
-                "created_at": "2024-01-01T10:00:00Z",
-                "updated_at": "2024-01-02T10:00:00Z",
-                "merged_at": "2024-01-02T12:00:00Z",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
+def test_api_error(mock_session):
+    """Test API error handling when fetching reviewers."""
+    error_response = Mock()
+    error_response.status_code = 500
+    error_response.text = "Internal Server Error"
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    mock_session.get.return_value = error_response
 
-        assert len(result["pull_requests"]) == 1
-        pr = result["pull_requests"][0]
-        assert pr["pull_request_id"] == 123
-        assert pr["current_status"] == "closed"
-        assert pr["date_created"] == "2024-01-01T10:00:00Z"
-        assert pr["date_modified"] == "2024-01-02T10:00:00Z"
-        assert pr["date_landed"] == "2024-01-02T12:00:00Z"
-        assert pr["target_repository"] == "mozilla/firefox"
-
-    def test_bug_id_extraction_basic(self):
-        """Test bug ID extraction from PR title."""
-        test_cases = [
-            ("Bug 1234567 - Fix issue", 1234567),
-            ("bug 1234567: Update code", 1234567),
-            ("Fix for bug 7654321", 7654321),
-            ("b=9876543 - Change behavior", 9876543),
-        ]
+    with pytest.raises(SystemExit) as exc_info:
+        main.extract_reviewers(mock_session, "mozilla/firefox", 123)
 
-        for title, expected_bug_id in test_cases:
-            raw_data = [
-                {
-                    "number": 1,
-                    "title": title,
-                    "state": "open",
-                    "labels": [],
-                    "commit_data": [],
-                    "reviewer_data": [],
-                    "comment_data": [],
-                }
-            ]
-
-            result = main.transform_data(raw_data, "mozilla/firefox")
-            assert result["pull_requests"][0]["bug_id"] == expected_bug_id
-
-    def test_bug_id_extraction_with_hash(self):
-        """Test bug ID extraction with # symbol."""
-        raw_data = [
-            {
-                "number": 1,
-                "title": "Bug #1234567 - Fix issue",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
+    assert "GitHub API error 500" in str(exc_info.value)
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
-        assert result["pull_requests"][0]["bug_id"] == 1234567
+def test_custom_github_api_url(mock_session):
+    """Test using custom GitHub API URL for reviewers."""
+    custom_url = "https://mock-github.example.com"
 
-    def test_bug_id_filter_large_numbers(self):
-        """Test that bug IDs >= 100000000 are filtered out."""
-        raw_data = [
-            {
-                "number": 1,
-                "title": "Bug 999999999 - Invalid bug ID",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
+    reviewers_response = Mock()
+    reviewers_response.status_code = 200
+    reviewers_response.json.return_value = []
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
-        assert result["pull_requests"][0]["bug_id"] is None
+    mock_session.get.return_value = reviewers_response
 
-    def test_bug_id_no_match(self):
-        """Test PR title with no bug ID."""
-        raw_data = [
-            {
-                "number": 1,
-                "title": "Update documentation",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
+    main.extract_reviewers(
+        mock_session, "mozilla/firefox", 123, github_api_url=custom_url
+    )
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
-        assert result["pull_requests"][0]["bug_id"] is None
+    call_args = mock_session.get.call_args
+    assert custom_url in call_args[0][0]
 
-    def test_labels_extraction(self):
-        """Test labels array extraction."""
-        raw_data = [
-            {
-                "number": 1,
-                "title": "PR with labels",
-                "state": "open",
-                "labels": [
-                    {"name": "bug"},
-                    {"name": "priority-high"},
-                    {"name": "needs-review"},
-                ],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
-        labels = result["pull_requests"][0]["labels"]
-        assert len(labels) == 3
-        assert "bug" in labels
-        assert "priority-high" in labels
-        assert "needs-review" in labels
-
-    def test_labels_empty_list(self):
-        """Test handling empty labels list."""
-        raw_data = [
-            {
-                "number": 1,
-                "title": "PR without labels",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
-        assert result["pull_requests"][0]["labels"] == []
+# =============================================================================
+# TESTS FOR EXTRACT_COMMENTS
+# =============================================================================
 
-    def test_commit_transformation(self):
-        """Test commit fields mapping."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "PR with commits",
-                "state": "open",
-                "labels": [],
-                "commit_data": [
-                    {
-                        "sha": "abc123",
-                        "commit": {
-                            "author": {
-                                "name": "Test Author",
-                                "date": "2024-01-01T12:00:00Z",
-                            }
-                        },
-                        "files": [
-                            {
-                                "filename": "src/main.py",
-                                "additions": 10,
-                                "deletions": 5,
-                            }
-                        ],
-                    }
-                ],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
+    comments_response = Mock()
+    comments_response.status_code = 200
+    comments_response.json.return_value = [
+        {
+            "id": 456,
+            "user": {"login": "commenter1"},
+            "body": "This looks good",
+            "created_at": "2024-01-01T14:00:00Z",
+        },
+        {
+            "id": 457,
+            "user": {"login": "commenter2"},
+            "body": "I have concerns",
+            "created_at": "2024-01-01T15:00:00Z",
+        },
+    ]
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    mock_session.get.return_value = comments_response
 
-        assert len(result["commits"]) == 1
-        commit = result["commits"][0]
-        assert commit["pull_request_id"] == 123
-        assert commit["target_repository"] == "mozilla/firefox"
-        assert commit["commit_sha"] == "abc123"
-        assert commit["date_created"] == "2024-01-01T12:00:00Z"
-        assert commit["author_username"] == "Test Author"
-        assert commit["filename"] == "src/main.py"
-        assert commit["lines_added"] == 10
-        assert commit["lines_removed"] == 5
-
-    def test_commit_file_flattening(self):
-        """Test that each file becomes a separate row."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "PR with multiple files",
-                "state": "open",
-                "labels": [],
-                "commit_data": [
-                    {
-                        "sha": "abc123",
-                        "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
-                        "files": [
-                            {"filename": "file1.py", "additions": 10, "deletions": 5},
-                            {"filename": "file2.py", "additions": 20, "deletions": 2},
-                            {"filename": "file3.py", "additions": 5, "deletions": 15},
-                        ],
-                    }
-                ],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
+    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    assert len(result) == 2
+    assert result[0]["id"] == 456
+    assert result[1]["id"] == 457
 
-        # Should have 3 rows in commits table (one per file)
-        assert len(result["commits"]) == 3
-        filenames = [c["filename"] for c in result["commits"]]
-        assert "file1.py" in filenames
-        assert "file2.py" in filenames
-        assert "file3.py" in filenames
+def test_uses_issues_endpoint(mock_session):
+    """Test that comments use /issues endpoint not /pulls."""
+    comments_response = Mock()
+    comments_response.status_code = 200
+    comments_response.json.return_value = []
 
-    def test_multiple_commits_with_files(self):
-        """Test multiple commits with multiple files per PR."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "PR with multiple commits",
-                "state": "open",
-                "labels": [],
-                "commit_data": [
-                    {
-                        "sha": "commit1",
-                        "commit": {"author": {"name": "Author1", "date": "2024-01-01"}},
-                        "files": [
-                            {"filename": "file1.py", "additions": 10, "deletions": 0}
-                        ],
-                    },
-                    {
-                        "sha": "commit2",
-                        "commit": {"author": {"name": "Author2", "date": "2024-01-02"}},
-                        "files": [
-                            {"filename": "file2.py", "additions": 5, "deletions": 2},
-                            {"filename": "file3.py", "additions": 8, "deletions": 3},
-                        ],
-                    },
-                ],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
+    mock_session.get.return_value = comments_response
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    main.extract_comments(mock_session, "mozilla/firefox", 123)
 
-        # Should have 3 rows total (1 file from commit1, 2 files from commit2)
-        assert len(result["commits"]) == 3
-        assert result["commits"][0]["commit_sha"] == "commit1"
-        assert result["commits"][1]["commit_sha"] == "commit2"
-        assert result["commits"][2]["commit_sha"] == "commit2"
+    call_args = mock_session.get.call_args
+    url = call_args[0][0]
+    assert "/issues/123/comments" in url
+    assert "/pulls/123/comments" not in url
 
-    def test_reviewer_transformation(self):
-        """Test reviewer fields mapping."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "PR with reviewers",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [
-                    {
-                        "id": 789,
-                        "user": {"login": "reviewer1"},
-                        "state": "APPROVED",
-                        "submitted_at": "2024-01-01T15:00:00Z",
-                    }
-                ],
-                "comment_data": [],
-            }
-        ]
+def test_multiple_comments(mock_session):
+    """Test handling multiple comments."""
+    comments_response = Mock()
+    comments_response.status_code = 200
+    comments_response.json.return_value = [
+        {"id": i, "user": {"login": f"user{i}"}, "body": f"Comment {i}"}
+        for i in range(1, 11)
+    ]
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    mock_session.get.return_value = comments_response
 
-        assert len(result["reviewers"]) == 1
-        reviewer = result["reviewers"][0]
-        assert reviewer["pull_request_id"] == 123
-        assert reviewer["target_repository"] == "mozilla/firefox"
-        assert reviewer["reviewer_username"] == "reviewer1"
-        assert reviewer["status"] == "APPROVED"
-        assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z"
+    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
 
-    def test_multiple_review_states(self):
-        """Test handling multiple review states."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "PR with multiple reviews",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [
-                    {
-                        "id": 1,
-                        "user": {"login": "user1"},
-                        "state": "APPROVED",
-                        "submitted_at": "2024-01-01T15:00:00Z",
-                    },
-                    {
-                        "id": 2,
-                        "user": {"login": "user2"},
-                        "state": "CHANGES_REQUESTED",
-                        "submitted_at": "2024-01-01T16:00:00Z",
-                    },
-                    {
-                        "id": 3,
-                        "user": {"login": "user3"},
-                        "state": "COMMENTED",
-                        "submitted_at": "2024-01-01T17:00:00Z",
-                    },
-                ],
-                "comment_data": [],
-            }
-        ]
+    assert len(result) == 10
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+def test_empty_comments_list(mock_session):
+    """Test handling PR with no comments."""
+    comments_response = Mock()
+    comments_response.status_code = 200
+    comments_response.json.return_value = []
 
-        assert len(result["reviewers"]) == 3
-        states = [r["status"] for r in result["reviewers"]]
-        assert "APPROVED" in states
-        assert "CHANGES_REQUESTED" in states
-        assert "COMMENTED" in states
+    mock_session.get.return_value = comments_response
 
-    def test_date_approved_from_earliest_approval(self):
-        """Test that date_approved is set to earliest APPROVED review."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "PR with multiple approvals",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [
-                    {
-                        "id": 1,
-                        "user": {"login": "user1"},
-                        "state": "APPROVED",
-                        "submitted_at": "2024-01-02T15:00:00Z",
-                    },
-                    {
-                        "id": 2,
-                        "user": {"login": "user2"},
-                        "state": "APPROVED",
-                        "submitted_at": "2024-01-01T14:00:00Z",  # Earliest
-                    },
-                    {
-                        "id": 3,
-                        "user": {"login": "user3"},
-                        "state": "APPROVED",
-                        "submitted_at": "2024-01-03T16:00:00Z",
-                    },
-                ],
-                "comment_data": [],
-            }
-        ]
+    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    assert result == []
 
-        pr = result["pull_requests"][0]
-        assert pr["date_approved"] == "2024-01-01T14:00:00Z"
+@patch("main.sleep_for_rate_limit")
+def test_rate_limit_handling(mock_sleep, mock_session):
+    """Test rate limit handling when fetching comments."""
+    rate_limit_response = Mock()
+    rate_limit_response.status_code = 403
+    rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
 
-    def test_comment_transformation(self):
-        """Test comment fields mapping."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "PR with comments",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [
-                    {
-                        "id": 456,
-                        "user": {"login": "commenter1"},
-                        "body": "This looks great!",
-                        "created_at": "2024-01-01T14:00:00Z",
-                        "pull_request_review_id": None,
-                    }
-                ],
-            }
-        ]
+    success_response = Mock()
+    success_response.status_code = 200
+    success_response.json.return_value = []
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    mock_session.get.side_effect = [rate_limit_response, success_response]
 
-        assert len(result["comments"]) == 1
-        comment = result["comments"][0]
-        assert comment["pull_request_id"] == 123
-        assert comment["target_repository"] == "mozilla/firefox"
-        assert comment["comment_id"] == 456
-        assert comment["author_username"] == "commenter1"
-        assert comment["date_created"] == "2024-01-01T14:00:00Z"
-        assert comment["character_count"] == 17
-
-    def test_comment_character_count(self):
-        """Test character count calculation for comments."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "PR",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [
-                    {
-                        "id": 1,
-                        "user": {"login": "user1"},
-                        "body": "Short",
-                        "created_at": "2024-01-01",
-                    },
-                    {
-                        "id": 2,
-                        "user": {"login": "user2"},
-                        "body": "This is a much longer comment with more text",
-                        "created_at": "2024-01-01",
-                    },
-                ],
-            }
-        ]
+    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    mock_sleep.assert_called_once()
+    assert result == []
 
-        assert result["comments"][0]["character_count"] == 5
-        assert result["comments"][1]["character_count"] == 44
+def test_api_error(mock_session):
+    """Test API error handling when fetching comments."""
+    error_response = Mock()
+    error_response.status_code = 404
+    error_response.text = "Not Found"
 
-    def test_comment_status_from_review(self):
-        """Test that comment status is mapped from review_id_statuses."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "PR",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [
-                    {
-                        "id": 789,
-                        "user": {"login": "reviewer"},
-                        "state": "APPROVED",
-                        "submitted_at": "2024-01-01",
-                    }
-                ],
-                "comment_data": [
-                    {
-                        "id": 456,
-                        "user": {"login": "commenter"},
-                        "body": "LGTM",
-                        "created_at": "2024-01-01",
-                        "pull_request_review_id": 789,
-                    }
-                ],
-            }
-        ]
+    mock_session.get.return_value = error_response
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    with pytest.raises(SystemExit) as exc_info:
+        main.extract_comments(mock_session, "mozilla/firefox", 123)
 
-        # Comment should have status from the review
-        assert result["comments"][0]["status"] == "APPROVED"
+    assert "GitHub API error 404" in str(exc_info.value)
 
-    def test_comment_empty_body(self):
-        """Test handling comments with empty or None body."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "PR",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [
-                    {
-                        "id": 1,
-                        "user": {"login": "user1"},
-                        "body": None,
-                        "created_at": "2024-01-01",
-                    },
-                    {
-                        "id": 2,
-                        "user": {"login": "user2"},
-                        "body": "",
-                        "created_at": "2024-01-01",
-                    },
-                ],
-            }
-        ]
+def test_custom_github_api_url(mock_session):
+    """Test using custom GitHub API URL for comments."""
+    custom_url = "https://mock-github.example.com"
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    comments_response = Mock()
+    comments_response.status_code = 200
+    comments_response.json.return_value = []
 
-        assert result["comments"][0]["character_count"] == 0
-        assert result["comments"][1]["character_count"] == 0
+    mock_session.get.return_value = comments_response
 
-    def test_empty_raw_data(self):
-        """Test handling empty input list."""
-        result = main.transform_data([], "mozilla/firefox")
+    main.extract_comments(
+        mock_session, "mozilla/firefox", 123, github_api_url=custom_url
+    )
 
-        assert result["pull_requests"] == []
-        assert result["commits"] == []
-        assert result["reviewers"] == []
-        assert result["comments"] == []
+    call_args = mock_session.get.call_args
+    assert custom_url in call_args[0][0]
 
-    def test_pr_without_commits_reviewers_comments(self):
-        """Test PR with no commits, reviewers, or comments."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "Minimal PR",
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
 
-        result = main.transform_data(raw_data, "mozilla/firefox")
 
-        assert len(result["pull_requests"]) == 1
-        assert len(result["commits"]) == 0
-        assert len(result["reviewers"]) == 0
-        assert len(result["comments"]) == 0
+# =============================================================================
+# TESTS FOR TRANSFORM_DATA
+# =============================================================================
 
-    def test_return_structure(self):
-        """Test that transform_data returns dict with 4 keys."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "Fix login bug",
+            "state": "closed",
+            "created_at": "2024-01-01T10:00:00Z",
+            "updated_at": "2024-01-02T10:00:00Z",
+            "merged_at": "2024-01-02T12:00:00Z",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["pull_requests"]) == 1
+    pr = result["pull_requests"][0]
+    assert pr["pull_request_id"] == 123
+    assert pr["current_status"] == "closed"
+    assert pr["date_created"] == "2024-01-01T10:00:00Z"
+    assert pr["date_modified"] == "2024-01-02T10:00:00Z"
+    assert pr["date_landed"] == "2024-01-02T12:00:00Z"
+    assert pr["target_repository"] == "mozilla/firefox"
+
+def test_bug_id_extraction_basic():
+    """Test bug ID extraction from PR title."""
+    test_cases = [
+        ("Bug 1234567 - Fix issue", 1234567),
+        ("bug 1234567: Update code", 1234567),
+        ("Fix for bug 7654321", 7654321),
+        ("b=9876543 - Change behavior", 9876543),
+    ]
+
+    for title, expected_bug_id in test_cases:
         raw_data = [
             {
                 "number": 1,
-                "title": "Test",
+                "title": title,
                 "state": "open",
                 "labels": [],
                 "commit_data": [],
@@ -1444,638 +958,835 @@ def test_return_structure(self):
         ]
 
         result = main.transform_data(raw_data, "mozilla/firefox")
+        assert result["pull_requests"][0]["bug_id"] == expected_bug_id
+
+def test_bug_id_extraction_with_hash():
+    """Test bug ID extraction with # symbol."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "Bug #1234567 - Fix issue",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+    assert result["pull_requests"][0]["bug_id"] == 1234567
+
+def test_bug_id_filter_large_numbers():
+    """Test that bug IDs >= 100000000 are filtered out."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "Bug 999999999 - Invalid bug ID",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+    assert result["pull_requests"][0]["bug_id"] is None
+
+def test_bug_id_no_match():
+    """Test PR title with no bug ID."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "Update documentation",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+    assert result["pull_requests"][0]["bug_id"] is None
+
+def test_labels_extraction():
+    """Test labels array extraction."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "PR with labels",
+            "state": "open",
+            "labels": [
+                {"name": "bug"},
+                {"name": "priority-high"},
+                {"name": "needs-review"},
+            ],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+    labels = result["pull_requests"][0]["labels"]
+    assert len(labels) == 3
+    assert "bug" in labels
+    assert "priority-high" in labels
+    assert "needs-review" in labels
+
+def test_labels_empty_list():
+    """Test handling empty labels list."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "PR without labels",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+    assert result["pull_requests"][0]["labels"] == []
+
+def test_commit_transformation():
+    """Test commit fields mapping."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with commits",
+            "state": "open",
+            "labels": [],
+            "commit_data": [
+                {
+                    "sha": "abc123",
+                    "commit": {
+                        "author": {
+                            "name": "Test Author",
+                            "date": "2024-01-01T12:00:00Z",
+                        }
+                    },
+                    "files": [
+                        {
+                            "filename": "src/main.py",
+                            "additions": 10,
+                            "deletions": 5,
+                        }
+                    ],
+                }
+            ],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["commits"]) == 1
+    commit = result["commits"][0]
+    assert commit["pull_request_id"] == 123
+    assert commit["target_repository"] == "mozilla/firefox"
+    assert commit["commit_sha"] == "abc123"
+    assert commit["date_created"] == "2024-01-01T12:00:00Z"
+    assert commit["author_username"] == "Test Author"
+    assert commit["filename"] == "src/main.py"
+    assert commit["lines_added"] == 10
+    assert commit["lines_removed"] == 5
+
+def test_commit_file_flattening():
+    """Test that each file becomes a separate row."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with multiple files",
+            "state": "open",
+            "labels": [],
+            "commit_data": [
+                {
+                    "sha": "abc123",
+                    "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
+                    "files": [
+                        {"filename": "file1.py", "additions": 10, "deletions": 5},
+                        {"filename": "file2.py", "additions": 20, "deletions": 2},
+                        {"filename": "file3.py", "additions": 5, "deletions": 15},
+                    ],
+                }
+            ],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    # Should have 3 rows in commits table (one per file)
+    assert len(result["commits"]) == 3
+    filenames = [c["filename"] for c in result["commits"]]
+    assert "file1.py" in filenames
+    assert "file2.py" in filenames
+    assert "file3.py" in filenames
+
+def test_multiple_commits_with_files():
+    """Test multiple commits with multiple files per PR."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with multiple commits",
+            "state": "open",
+            "labels": [],
+            "commit_data": [
+                {
+                    "sha": "commit1",
+                    "commit": {"author": {"name": "Author1", "date": "2024-01-01"}},
+                    "files": [
+                        {"filename": "file1.py", "additions": 10, "deletions": 0}
+                    ],
+                },
+                {
+                    "sha": "commit2",
+                    "commit": {"author": {"name": "Author2", "date": "2024-01-02"}},
+                    "files": [
+                        {"filename": "file2.py", "additions": 5, "deletions": 2},
+                        {"filename": "file3.py", "additions": 8, "deletions": 3},
+                    ],
+                },
+            ],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    # Should have 3 rows total (1 file from commit1, 2 files from commit2)
+    assert len(result["commits"]) == 3
+    assert result["commits"][0]["commit_sha"] == "commit1"
+    assert result["commits"][1]["commit_sha"] == "commit2"
+    assert result["commits"][2]["commit_sha"] == "commit2"
+
+def test_reviewer_transformation():
+    """Test reviewer fields mapping."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with reviewers",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [
+                {
+                    "id": 789,
+                    "user": {"login": "reviewer1"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-01T15:00:00Z",
+                }
+            ],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["reviewers"]) == 1
+    reviewer = result["reviewers"][0]
+    assert reviewer["pull_request_id"] == 123
+    assert reviewer["target_repository"] == "mozilla/firefox"
+    assert reviewer["reviewer_username"] == "reviewer1"
+    assert reviewer["status"] == "APPROVED"
+    assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z"
+
+def test_multiple_review_states():
+    """Test handling multiple review states."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with multiple reviews",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [
+                {
+                    "id": 1,
+                    "user": {"login": "user1"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-01T15:00:00Z",
+                },
+                {
+                    "id": 2,
+                    "user": {"login": "user2"},
+                    "state": "CHANGES_REQUESTED",
+                    "submitted_at": "2024-01-01T16:00:00Z",
+                },
+                {
+                    "id": 3,
+                    "user": {"login": "user3"},
+                    "state": "COMMENTED",
+                    "submitted_at": "2024-01-01T17:00:00Z",
+                },
+            ],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["reviewers"]) == 3
+    states = [r["status"] for r in result["reviewers"]]
+    assert "APPROVED" in states
+    assert "CHANGES_REQUESTED" in states
+    assert "COMMENTED" in states
+
+def test_date_approved_from_earliest_approval():
+    """Test that date_approved is set to earliest APPROVED review."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with multiple approvals",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [
+                {
+                    "id": 1,
+                    "user": {"login": "user1"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-02T15:00:00Z",
+                },
+                {
+                    "id": 2,
+                    "user": {"login": "user2"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-01T14:00:00Z",  # Earliest
+                },
+                {
+                    "id": 3,
+                    "user": {"login": "user3"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-03T16:00:00Z",
+                },
+            ],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    pr = result["pull_requests"][0]
+    assert pr["date_approved"] == "2024-01-01T14:00:00Z"
+
+def test_comment_transformation():
+    """Test comment fields mapping."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with comments",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [
+                {
+                    "id": 456,
+                    "user": {"login": "commenter1"},
+                    "body": "This looks great!",
+                    "created_at": "2024-01-01T14:00:00Z",
+                    "pull_request_review_id": None,
+                }
+            ],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["comments"]) == 1
+    comment = result["comments"][0]
+    assert comment["pull_request_id"] == 123
+    assert comment["target_repository"] == "mozilla/firefox"
+    assert comment["comment_id"] == 456
+    assert comment["author_username"] == "commenter1"
+    assert comment["date_created"] == "2024-01-01T14:00:00Z"
+    assert comment["character_count"] == 17
+
+def test_comment_character_count():
+    """Test character count calculation for comments."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [
+                {
+                    "id": 1,
+                    "user": {"login": "user1"},
+                    "body": "Short",
+                    "created_at": "2024-01-01",
+                },
+                {
+                    "id": 2,
+                    "user": {"login": "user2"},
+                    "body": "This is a much longer comment with more text",
+                    "created_at": "2024-01-01",
+                },
+            ],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert result["comments"][0]["character_count"] == 5
+    assert result["comments"][1]["character_count"] == 44
+
+def test_comment_status_from_review():
+    """Test that comment status is mapped from review_id_statuses."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [
+                {
+                    "id": 789,
+                    "user": {"login": "reviewer"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-01",
+                }
+            ],
+            "comment_data": [
+                {
+                    "id": 456,
+                    "user": {"login": "commenter"},
+                    "body": "LGTM",
+                    "created_at": "2024-01-01",
+                    "pull_request_review_id": 789,
+                }
+            ],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    # Comment should have status from the review
+    assert result["comments"][0]["status"] == "APPROVED"
+
+def test_comment_empty_body():
+    """Test handling comments with empty or None body."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [
+                {
+                    "id": 1,
+                    "user": {"login": "user1"},
+                    "body": None,
+                    "created_at": "2024-01-01",
+                },
+                {
+                    "id": 2,
+                    "user": {"login": "user2"},
+                    "body": "",
+                    "created_at": "2024-01-01",
+                },
+            ],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert result["comments"][0]["character_count"] == 0
+    assert result["comments"][1]["character_count"] == 0
+
+def test_empty_raw_data():
+    """Test handling empty input list."""
+    result = main.transform_data([], "mozilla/firefox")
+
+    assert result["pull_requests"] == []
+    assert result["commits"] == []
+    assert result["reviewers"] == []
+    assert result["comments"] == []
+
+def test_pr_without_commits_reviewers_comments():
+    """Test PR with no commits, reviewers, or comments."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "Minimal PR",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["pull_requests"]) == 1
+    assert len(result["commits"]) == 0
+    assert len(result["reviewers"]) == 0
+    assert len(result["comments"]) == 0
+
+def test_return_structure():
+    """Test that transform_data returns dict with 4 keys."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "Test",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert isinstance(result, dict)
+    assert "pull_requests" in result
+    assert "commits" in result
+    assert "reviewers" in result
+    assert "comments" in result
+
+def test_all_tables_have_target_repository():
+    """Test that all tables include target_repository field."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "Test PR",
+            "state": "open",
+            "labels": [],
+            "commit_data": [
+                {
+                    "sha": "abc",
+                    "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
+                    "files": [
+                        {"filename": "test.py", "additions": 1, "deletions": 0}
+                    ],
+                }
+            ],
+            "reviewer_data": [
+                {
+                    "id": 1,
+                    "user": {"login": "reviewer"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-01",
+                }
+            ],
+            "comment_data": [
+                {
+                    "id": 2,
+                    "user": {"login": "commenter"},
+                    "body": "Test",
+                    "created_at": "2024-01-01",
+                }
+            ],
+        }
+    ]
 
-        assert isinstance(result, dict)
-        assert "pull_requests" in result
-        assert "commits" in result
-        assert "reviewers" in result
-        assert "comments" in result
-
-    def test_all_tables_have_target_repository(self):
-        """Test that all tables include target_repository field."""
-        raw_data = [
-            {
-                "number": 123,
-                "title": "Test PR",
-                "state": "open",
-                "labels": [],
-                "commit_data": [
-                    {
-                        "sha": "abc",
-                        "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
-                        "files": [
-                            {"filename": "test.py", "additions": 1, "deletions": 0}
-                        ],
-                    }
-                ],
-                "reviewer_data": [
-                    {
-                        "id": 1,
-                        "user": {"login": "reviewer"},
-                        "state": "APPROVED",
-                        "submitted_at": "2024-01-01",
-                    }
-                ],
-                "comment_data": [
-                    {
-                        "id": 2,
-                        "user": {"login": "commenter"},
-                        "body": "Test",
-                        "created_at": "2024-01-01",
-                    }
-                ],
-            }
-        ]
-
-        result = main.transform_data(raw_data, "mozilla/firefox")
+    result = main.transform_data(raw_data, "mozilla/firefox")
 
-        assert result["pull_requests"][0]["target_repository"] == "mozilla/firefox"
-        assert result["commits"][0]["target_repository"] == "mozilla/firefox"
-        assert result["reviewers"][0]["target_repository"] == "mozilla/firefox"
-        assert result["comments"][0]["target_repository"] == "mozilla/firefox"
+    assert result["pull_requests"][0]["target_repository"] == "mozilla/firefox"
+    assert result["commits"][0]["target_repository"] == "mozilla/firefox"
+    assert result["reviewers"][0]["target_repository"] == "mozilla/firefox"
+    assert result["comments"][0]["target_repository"] == "mozilla/firefox"
 
 
-class TestLoadData:
-    """Tests for load_data function."""
 
-    @patch("main.datetime")
-    def test_load_all_tables(self, mock_datetime, mock_bigquery_client):
-        """Test loading all 4 tables to BigQuery."""
-        mock_datetime.now.return_value.strftime.return_value = "2024-01-15"
+# =============================================================================
+# TESTS FOR LOAD_DATA
+# =============================================================================
 
-        transformed_data = {
-            "pull_requests": [{"pull_request_id": 1}],
-            "commits": [{"commit_sha": "abc"}],
-            "reviewers": [{"reviewer_username": "user1"}],
-            "comments": [{"comment_id": 123}],
-        }
 
-        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+@patch("main.datetime")
+def test_load_data_inserts_all_tables(mock_datetime, mock_bigquery_client):
+    """Test that load_data inserts all tables correctly."""
+    mock_datetime.now.return_value.strftime.return_value = "2024-01-15"
 
-        # Should call insert_rows_json 4 times (once per table)
-        assert mock_bigquery_client.insert_rows_json.call_count == 4
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [{"commit_sha": "abc"}],
+        "reviewers": [{"reviewer_username": "user1"}],
+        "comments": [{"comment_id": 123}],
+    }
 
-    @patch("main.datetime")
-    def test_adds_snapshot_date(self, mock_datetime, mock_bigquery_client):
-        """Test that snapshot_date is added to all rows."""
-        mock_datetime.now.return_value.strftime.return_value = "2024-01-15"
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
 
-        transformed_data = {
-            "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}],
-            "commits": [],
-            "reviewers": [],
-            "comments": [],
-        }
+    # Should call insert_rows_json 4 times (once per table)
+    assert mock_bigquery_client.insert_rows_json.call_count == 4
 
-        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+@patch("main.datetime")
+def test_adds_snapshot_date(mock_datetime, mock_bigquery_client):
+    """Test that snapshot_date is added to all rows."""
+    mock_datetime.now.return_value.strftime.return_value = "2024-01-15"
 
-        call_args = mock_bigquery_client.insert_rows_json.call_args
-        rows = call_args[0][1]
-        assert all(row["snapshot_date"] == "2024-01-15" for row in rows)
-
-    def test_constructs_correct_table_ref(self, mock_bigquery_client):
-        """Test that table_ref is constructed correctly."""
-        transformed_data = {
-            "pull_requests": [{"pull_request_id": 1}],
-            "commits": [],
-            "reviewers": [],
-            "comments": [],
-        }
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
 
-        main.load_data(mock_bigquery_client, "my_dataset", transformed_data)
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
 
-        call_args = mock_bigquery_client.insert_rows_json.call_args
-        table_ref = call_args[0][0]
-        assert table_ref == "test-project.my_dataset.pull_requests"
+    call_args = mock_bigquery_client.insert_rows_json.call_args
+    rows = call_args[0][1]
+    assert all(row["snapshot_date"] == "2024-01-15" for row in rows)
 
-    def test_empty_transformed_data_skipped(self, mock_bigquery_client):
-        """Test that empty transformed_data dict is skipped."""
-        transformed_data = {}
+def test_constructs_correct_table_ref(mock_bigquery_client):
+    """Test that table_ref is constructed correctly."""
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
 
-        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+    main.load_data(mock_bigquery_client, "my_dataset", transformed_data)
 
-        mock_bigquery_client.insert_rows_json.assert_not_called()
+    call_args = mock_bigquery_client.insert_rows_json.call_args
+    table_ref = call_args[0][0]
+    assert table_ref == "test-project.my_dataset.pull_requests"
 
-    def test_skips_empty_tables_individually(self, mock_bigquery_client):
-        """Test that empty tables are skipped individually."""
-        transformed_data = {
-            "pull_requests": [{"pull_request_id": 1}],
-            "commits": [],  # Empty, should be skipped
-            "reviewers": [],  # Empty, should be skipped
-            "comments": [{"comment_id": 456}],
-        }
+def test_empty_transformed_data_skipped(mock_bigquery_client):
+    """Test that empty transformed_data dict is skipped."""
+    transformed_data = {}
 
-        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
 
-        # Should only call insert_rows_json twice (for PRs and comments)
-        assert mock_bigquery_client.insert_rows_json.call_count == 2
+    mock_bigquery_client.insert_rows_json.assert_not_called()
 
-    def test_only_pull_requests_table(self, mock_bigquery_client):
-        """Test loading only pull_requests table."""
-        transformed_data = {
-            "pull_requests": [{"pull_request_id": 1}],
-            "commits": [],
-            "reviewers": [],
-            "comments": [],
-        }
+def test_skips_empty_tables_individually(mock_bigquery_client):
+    """Test that empty tables are skipped individually."""
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],  # Empty, should be skipped
+        "reviewers": [],  # Empty, should be skipped
+        "comments": [{"comment_id": 456}],
+    }
 
-        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
 
-        assert mock_bigquery_client.insert_rows_json.call_count == 1
+    # Should only call insert_rows_json twice (for PRs and comments)
+    assert mock_bigquery_client.insert_rows_json.call_count == 2
 
-    def test_raises_exception_on_insert_errors(self, mock_bigquery_client):
-        """Test that Exception is raised on BigQuery insert errors."""
-        mock_bigquery_client.insert_rows_json.return_value = [
-            {"index": 0, "errors": ["Insert failed"]}
-        ]
+def test_only_pull_requests_table(mock_bigquery_client):
+    """Test loading only pull_requests table."""
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
 
-        transformed_data = {
-            "pull_requests": [{"pull_request_id": 1}],
-            "commits": [],
-            "reviewers": [],
-            "comments": [],
-        }
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
 
-        with pytest.raises(Exception) as exc_info:
-            main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+    assert mock_bigquery_client.insert_rows_json.call_count == 1
 
-        assert "BigQuery insert errors" in str(exc_info.value)
+def test_raises_exception_on_insert_errors(mock_bigquery_client):
+    """Test that Exception is raised on BigQuery insert errors."""
+    mock_bigquery_client.insert_rows_json.return_value = [
+        {"index": 0, "errors": ["Insert failed"]}
+    ]
 
-    def test_verifies_client_insert_called_correctly(self, mock_bigquery_client):
-        """Test that client.insert_rows_json is called with correct arguments."""
-        transformed_data = {
-            "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}],
-            "commits": [],
-            "reviewers": [],
-            "comments": [],
-        }
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
 
+    with pytest.raises(Exception) as exc_info:
         main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
 
-        call_args = mock_bigquery_client.insert_rows_json.call_args
-        table_ref, rows = call_args[0]
+    assert "BigQuery insert errors" in str(exc_info.value)
 
-        assert "pull_requests" in table_ref
-        assert len(rows) == 2
+def test_verifies_client_insert_called_correctly(mock_bigquery_client):
+    """Test that client.insert_rows_json is called with correct arguments."""
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
 
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
 
-class TestMain:
-    """Tests for main function."""
+    call_args = mock_bigquery_client.insert_rows_json.call_args
+    table_ref, rows = call_args[0]
 
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_requires_github_repos(
-        self, mock_session_class, mock_bq_client, mock_setup_logging
-    ):
-        """Test that GITHUB_REPOS is required."""
-        with patch.dict(
-            os.environ,
-            {"BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test"},
-            clear=True,
-        ):
-            with pytest.raises(SystemExit) as exc_info:
-                main.main()
+    assert "pull_requests" in table_ref
+    assert len(rows) == 2
 
-            assert "GITHUB_REPOS" in str(exc_info.value)
 
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_requires_bigquery_project(
-        self, mock_session_class, mock_bq_client, mock_setup_logging
-    ):
-        """Test that BIGQUERY_PROJECT is required."""
-        with patch.dict(
-            os.environ,
-            {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"},
-            clear=True,
-        ):
-            with pytest.raises(SystemExit) as exc_info:
-                main.main()
 
-            assert "BIGQUERY_PROJECT" in str(exc_info.value)
-
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_requires_bigquery_dataset(
-        self, mock_session_class, mock_bq_client, mock_setup_logging
-    ):
-        """Test that BIGQUERY_DATASET is required."""
-        with patch.dict(
-            os.environ,
-            {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"},
-            clear=True,
-        ):
-            with pytest.raises(SystemExit) as exc_info:
-                main.main()
+# =============================================================================
+# TESTS FOR MAIN
+# =============================================================================
 
-            assert "BIGQUERY_DATASET" in str(exc_info.value)
 
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_github_token_optional_with_warning(
-        self, mock_session_class, mock_bq_client, mock_setup_logging
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_requires_github_repos(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that GITHUB_REPOS is required."""
+    with patch.dict(
+        os.environ,
+        {"BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test"},
+        clear=True,
     ):
-        """Test that GITHUB_TOKEN is optional but warns if missing."""
-        with (
-            patch.dict(
-                os.environ,
-                {
-                    "GITHUB_REPOS": "mozilla/firefox",
-                    "BIGQUERY_PROJECT": "test",
-                    "BIGQUERY_DATASET": "test",
-                },
-                clear=True,
-            ),
-            patch("main.extract_pull_requests", return_value=iter([])),
-        ):
-            # Should not raise, but should log warning
-            result = main.main()
-            assert result == 0
-
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_splits_github_repos_by_comma(
-        self, mock_session_class, mock_bq_client, mock_setup_logging
-    ):
-        """Test that GITHUB_REPOS is split by comma."""
-        with (
-            patch.dict(
-                os.environ,
-                {
-                    "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev",
-                    "BIGQUERY_PROJECT": "test",
-                    "BIGQUERY_DATASET": "test",
-                    "GITHUB_TOKEN": "token",
-                },
-                clear=True,
-            ),
-            patch("main.extract_pull_requests", return_value=iter([])) as mock_extract,
-        ):
+        with pytest.raises(SystemExit) as exc_info:
             main.main()
 
-            # Should be called twice (once per repo)
-            assert mock_extract.call_count == 2
+        assert "GITHUB_REPOS" in str(exc_info.value)
 
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_honors_github_api_url(
-        self, mock_session_class, mock_bq_client, mock_setup_logging
-    ):
-        """Test that GITHUB_API_URL is honored."""
-        with (
-            patch.dict(
-                os.environ,
-                {
-                    "GITHUB_REPOS": "mozilla/firefox",
-                    "BIGQUERY_PROJECT": "test",
-                    "BIGQUERY_DATASET": "test",
-                    "GITHUB_TOKEN": "token",
-                    "GITHUB_API_URL": "https://custom-api.example.com",
-                },
-                clear=True,
-            ),
-            patch("main.extract_pull_requests", return_value=iter([])) as mock_extract,
-        ):
-            main.main()
-
-            call_kwargs = mock_extract.call_args[1]
-            assert call_kwargs["github_api_url"] == "https://custom-api.example.com"
 
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_honors_bigquery_emulator_host(
-        self, mock_session_class, mock_bq_client_class, mock_setup_logging
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_requires_bigquery_project(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that BIGQUERY_PROJECT is required."""
+    with patch.dict(
+        os.environ,
+        {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"},
+        clear=True,
     ):
-        """Test that BIGQUERY_EMULATOR_HOST is honored."""
-        with (
-            patch.dict(
-                os.environ,
-                {
-                    "GITHUB_REPOS": "mozilla/firefox",
-                    "BIGQUERY_PROJECT": "test",
-                    "BIGQUERY_DATASET": "test",
-                    "GITHUB_TOKEN": "token",
-                    "BIGQUERY_EMULATOR_HOST": "http://localhost:9050",
-                },
-                clear=True,
-            ),
-            patch("main.extract_pull_requests", return_value=iter([])),
-        ):
+        with pytest.raises(SystemExit) as exc_info:
             main.main()
 
-            # Verify BigQuery client was created with emulator settings
-            mock_bq_client_class.assert_called_once()
-
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_creates_session_with_headers(
-        self, mock_session_class, mock_bq_client, mock_setup_logging
-    ):
-        """Test that session is created with Accept and User-Agent headers."""
-        mock_session = MagicMock()
-        mock_session_class.return_value = mock_session
+        assert "BIGQUERY_PROJECT" in str(exc_info.value)
 
-        with (
-            patch.dict(
-                os.environ,
-                {
-                    "GITHUB_REPOS": "mozilla/firefox",
-                    "BIGQUERY_PROJECT": "test",
-                    "BIGQUERY_DATASET": "test",
-                    "GITHUB_TOKEN": "token",
-                },
-                clear=True,
-            ),
-            patch("main.extract_pull_requests", return_value=iter([])),
-        ):
-            main.main()
 
-            # Verify session headers were set
-            assert mock_session.headers.update.called
-            call_args = mock_session.headers.update.call_args[0][0]
-            assert "Accept" in call_args
-            assert "User-Agent" in call_args
-
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_sets_authorization_header_with_token(
-        self, mock_session_class, mock_bq_client, mock_setup_logging
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_requires_bigquery_dataset(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that BIGQUERY_DATASET is required."""
+    with patch.dict(
+        os.environ,
+        {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"},
+        clear=True,
     ):
-        """Test that Authorization header is set when token provided."""
-        mock_session = MagicMock()
-        mock_session_class.return_value = mock_session
-
-        with (
-            patch.dict(
-                os.environ,
-                {
-                    "GITHUB_REPOS": "mozilla/firefox",
-                    "BIGQUERY_PROJECT": "test",
-                    "BIGQUERY_DATASET": "test",
-                    "GITHUB_TOKEN": "test-token-123",
-                },
-                clear=True,
-            ),
-            patch("main.extract_pull_requests", return_value=iter([])),
-        ):
+        with pytest.raises(SystemExit) as exc_info:
             main.main()
 
-            # Verify Authorization header was set
-            assert mock_session.headers.__setitem__.called
-
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    @patch("main.extract_pull_requests")
-    @patch("main.transform_data")
-    @patch("main.load_data")
-    def test_single_repo_successful_etl(
-        self,
-        mock_load,
-        mock_transform,
-        mock_extract,
-        mock_session_class,
-        mock_bq_client,
-        mock_setup_logging,
-    ):
-        """Test successful ETL for single repository."""
-        mock_extract.return_value = iter([[{"number": 1}]])
-        mock_transform.return_value = {
-            "pull_requests": [{"pull_request_id": 1}],
-            "commits": [],
-            "reviewers": [],
-            "comments": [],
-        }
+        assert "BIGQUERY_DATASET" in str(exc_info.value)
 
-        with patch.dict(
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_github_token_optional_with_warning(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that GITHUB_TOKEN is optional but warns if missing."""
+    with (
+        patch.dict(
             os.environ,
             {
                 "GITHUB_REPOS": "mozilla/firefox",
                 "BIGQUERY_PROJECT": "test",
                 "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
             },
             clear=True,
-        ):
-            result = main.main()
-
-        assert result == 0
-        mock_extract.assert_called_once()
-        mock_transform.assert_called_once()
-        mock_load.assert_called_once()
-
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    @patch("main.extract_pull_requests")
-    @patch("main.transform_data")
-    @patch("main.load_data")
-    def test_multiple_repos_processing(
-        self,
-        mock_load,
-        mock_transform,
-        mock_extract,
-        mock_session_class,
-        mock_bq_client,
-        mock_setup_logging,
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])),
     ):
-        """Test processing multiple repositories."""
-        mock_extract.return_value = iter([[{"number": 1}]])
-        mock_transform.return_value = {
-            "pull_requests": [{"pull_request_id": 1}],
-            "commits": [],
-            "reviewers": [],
-            "comments": [],
-        }
+        # Should not raise, but should log warning
+        result = main.main()
+        assert result == 0
 
-        with patch.dict(
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_splits_github_repos_by_comma(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that GITHUB_REPOS is split by comma."""
+    with (
+        patch.dict(
             os.environ,
             {
-                "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev,mozilla/addons",
+                "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev",
                 "BIGQUERY_PROJECT": "test",
                 "BIGQUERY_DATASET": "test",
                 "GITHUB_TOKEN": "token",
             },
             clear=True,
-        ):
-            result = main.main()
-
-        assert result == 0
-        # Should process 3 repositories
-        assert mock_extract.call_count == 3
-
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    @patch("main.extract_pull_requests")
-    @patch("main.transform_data")
-    @patch("main.load_data")
-    def test_processes_chunks_iteratively(
-        self,
-        mock_load,
-        mock_transform,
-        mock_extract,
-        mock_session_class,
-        mock_bq_client,
-        mock_setup_logging,
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])) as mock_extract,
     ):
-        """Test that chunks are processed iteratively from generator."""
-        # Return 3 chunks
-        mock_extract.return_value = iter(
-            [
-                [{"number": 1}],
-                [{"number": 2}],
-                [{"number": 3}],
-            ]
-        )
-        mock_transform.return_value = {
-            "pull_requests": [{"pull_request_id": 1}],
-            "commits": [],
-            "reviewers": [],
-            "comments": [],
-        }
-
-        with patch.dict(
+        main.main()
+
+        # Should be called twice (once per repo)
+        assert mock_extract.call_count == 2
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_honors_github_api_url(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that GITHUB_API_URL is honored."""
+    with (
+        patch.dict(
             os.environ,
             {
                 "GITHUB_REPOS": "mozilla/firefox",
                 "BIGQUERY_PROJECT": "test",
                 "BIGQUERY_DATASET": "test",
                 "GITHUB_TOKEN": "token",
+                "GITHUB_API_URL": "https://custom-api.example.com",
             },
             clear=True,
-        ):
-            result = main.main()
-
-        assert result == 0
-        # Transform and load should be called 3 times (once per chunk)
-        assert mock_transform.call_count == 3
-        assert mock_load.call_count == 3
-
-    @patch("main.setup_logging")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_returns_zero_on_success(
-        self, mock_session_class, mock_bq_client, mock_setup_logging
-    ):
-        """Test that main returns 0 on success."""
-        with (
-            patch.dict(
-                os.environ,
-                {
-                    "GITHUB_REPOS": "mozilla/firefox",
-                    "BIGQUERY_PROJECT": "test",
-                    "BIGQUERY_DATASET": "test",
-                    "GITHUB_TOKEN": "token",
-                },
-                clear=True,
-            ),
-            patch("main.extract_pull_requests", return_value=iter([])),
-        ):
-            result = main.main()
-
-        assert result == 0
-
-
-@pytest.mark.integration
-class TestIntegration:
-    """Integration tests that test multiple components together."""
-
-    @patch("main.setup_logging")
-    @patch("main.load_data")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_end_to_end_with_mocked_github(
-        self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])) as mock_extract,
     ):
-        """Test end-to-end flow with mocked GitHub responses."""
-        mock_session = MagicMock()
-        mock_session_class.return_value = mock_session
-
-        # Mock PR response
-        pr_response = Mock()
-        pr_response.status_code = 200
-        pr_response.json.return_value = [
-            {"number": 1, "title": "Bug 1234567 - Test PR", "state": "open"}
-        ]
-        pr_response.links = {}
-
-        # Mock commits, reviewers, comments responses
-        empty_response = Mock()
-        empty_response.status_code = 200
-        empty_response.json.return_value = []
-
-        mock_session.get.side_effect = [
-            pr_response,
-            empty_response,
-            empty_response,
-            empty_response,
-        ]
-
-        with patch.dict(
+        main.main()
+
+        call_kwargs = mock_extract.call_args[1]
+        assert call_kwargs["github_api_url"] == "https://custom-api.example.com"
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_honors_bigquery_emulator_host(mock_session_class, mock_bq_client_class, mock_setup_logging):
+    """Test that BIGQUERY_EMULATOR_HOST is honored."""
+    with (
+        patch.dict(
             os.environ,
             {
                 "GITHUB_REPOS": "mozilla/firefox",
                 "BIGQUERY_PROJECT": "test",
                 "BIGQUERY_DATASET": "test",
                 "GITHUB_TOKEN": "token",
+                "BIGQUERY_EMULATOR_HOST": "http://localhost:9050",
             },
             clear=True,
-        ):
-            result = main.main()
-
-        assert result == 0
-        mock_load.assert_called_once()
-
-        # Verify transformed data structure
-        call_args = mock_load.call_args[0]
-        transformed_data = call_args[2]
-        assert "pull_requests" in transformed_data
-        assert len(transformed_data["pull_requests"]) == 1
-
-    @patch("main.setup_logging")
-    @patch("main.load_data")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_bug_id_extraction_through_pipeline(
-        self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])),
     ):
-        """Test bug ID extraction through full pipeline."""
-        mock_session = MagicMock()
-        mock_session_class.return_value = mock_session
+        main.main()
 
-        pr_response = Mock()
-        pr_response.status_code = 200
-        pr_response.json.return_value = [
-            {
-                "number": 1,
-                "title": "Bug 9876543 - Fix critical issue",
-                "state": "closed",
-            }
-        ]
-        pr_response.links = {}
+        # Verify BigQuery client was created with emulator settings
+        mock_bq_client_class.assert_called_once()
 
-        empty_response = Mock()
-        empty_response.status_code = 200
-        empty_response.json.return_value = []
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_creates_session_with_headers(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that session is created with Accept and User-Agent headers."""
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
 
-        mock_session.get.side_effect = [
-            pr_response,
-            empty_response,
-            empty_response,
-            empty_response,
-        ]
-
-        with patch.dict(
+    with (
+        patch.dict(
             os.environ,
             {
                 "GITHUB_REPOS": "mozilla/firefox",
@@ -2084,59 +1795,176 @@ def test_bug_id_extraction_through_pipeline(
                 "GITHUB_TOKEN": "token",
             },
             clear=True,
-        ):
-            main.main()
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])),
+    ):
+        main.main()
+
+        # Verify session headers were set
+        assert mock_session.headers.update.called
+        call_args = mock_session.headers.update.call_args[0][0]
+        assert "Accept" in call_args
+        assert "User-Agent" in call_args
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_sets_authorization_header_with_token(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that Authorization header is set when token provided."""
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
+
+    with (
+        patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "test-token-123",
+            },
+            clear=True,
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])),
+    ):
+        main.main()
+
+        # Verify Authorization header was set
+        assert mock_session.headers.__setitem__.called
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+@patch("main.extract_pull_requests")
+@patch("main.transform_data")
+@patch("main.load_data")
+def test_single_repo_successful_etl(
+    mock_load,
+    mock_transform,
+    mock_extract,
+    mock_session_class,
+    mock_bq_client,
+    mock_setup_logging,
+):
+    """Test successful ETL for single repository."""
+    mock_extract.return_value = iter([[{"number": 1}]])
+    mock_transform.return_value = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
 
-        call_args = mock_load.call_args[0]
-        transformed_data = call_args[2]
-        pr = transformed_data["pull_requests"][0]
-        assert pr["bug_id"] == 9876543
-
-    @patch("main.setup_logging")
-    @patch("main.load_data")
-    @patch("main.bigquery.Client")
-    @patch("requests.Session")
-    def test_pagination_through_full_flow(
-        self, mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
     ):
-        """Test pagination through full ETL flow."""
-        mock_session = MagicMock()
-        mock_session_class.return_value = mock_session
-
-        # First page
-        pr_response_1 = Mock()
-        pr_response_1.status_code = 200
-        pr_response_1.json.return_value = [
-            {"number": 1, "title": "PR 1", "state": "open"}
-        ]
-        pr_response_1.links = {
-            "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
-        }
+        result = main.main()
+
+    assert result == 0
+    mock_extract.assert_called_once()
+    mock_transform.assert_called_once()
+    mock_load.assert_called_once()
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+@patch("main.extract_pull_requests")
+@patch("main.transform_data")
+@patch("main.load_data")
+def test_multiple_repos_processing(
+    mock_load,
+    mock_transform,
+    mock_extract,
+    mock_session_class,
+    mock_bq_client,
+    mock_setup_logging,
+):
+    """Test processing multiple repositories."""
+    mock_extract.return_value = iter([[{"number": 1}]])
+    mock_transform.return_value = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
 
-        # Second page
-        pr_response_2 = Mock()
-        pr_response_2.status_code = 200
-        pr_response_2.json.return_value = [
-            {"number": 2, "title": "PR 2", "state": "open"}
-        ]
-        pr_response_2.links = {}
-
-        empty_response = Mock()
-        empty_response.status_code = 200
-        empty_response.json.return_value = []
-
-        mock_session.get.side_effect = [
-            pr_response_1,
-            empty_response,
-            empty_response,
-            empty_response,
-            pr_response_2,
-            empty_response,
-            empty_response,
-            empty_response,
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev,mozilla/addons",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        result = main.main()
+
+    assert result == 0
+    # Should process 3 repositories
+    assert mock_extract.call_count == 3
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+@patch("main.extract_pull_requests")
+@patch("main.transform_data")
+@patch("main.load_data")
+def test_processes_chunks_iteratively(
+    mock_load,
+    mock_transform,
+    mock_extract,
+    mock_session_class,
+    mock_bq_client,
+    mock_setup_logging,
+):
+    """Test that chunks are processed iteratively from generator."""
+    # Return 3 chunks
+    mock_extract.return_value = iter(
+        [
+            [{"number": 1}],
+            [{"number": 2}],
+            [{"number": 3}],
         ]
+    )
+    mock_transform.return_value = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
 
-        with patch.dict(
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        result = main.main()
+
+    assert result == 0
+    # Transform and load should be called 3 times (once per chunk)
+    assert mock_transform.call_count == 3
+    assert mock_load.call_count == 3
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_returns_zero_on_success(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that main returns 0 on success."""
+    with (
+        patch.dict(
             os.environ,
             {
                 "GITHUB_REPOS": "mozilla/firefox",
@@ -2145,8 +1973,166 @@ def test_pagination_through_full_flow(
                 "GITHUB_TOKEN": "token",
             },
             clear=True,
-        ):
-            main.main()
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])),
+    ):
+        result = main.main()
+
+    assert result == 0
+
+
+@pytest.mark.integration
+@patch("main.setup_logging")
+@patch("main.load_data")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_full_etl_flow_transforms_data_correctly(mock_session_class, mock_bq_client, mock_load, mock_setup_logging):
+    """Test full ETL flow with mocked GitHub responses."""
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
+
+    # Mock PR response
+    pr_response = Mock()
+    pr_response.status_code = 200
+    pr_response.json.return_value = [
+        {"number": 1, "title": "Bug 1234567 - Test PR", "state": "open"}
+    ]
+    pr_response.links = {}
+
+    # Mock commits, reviewers, comments responses
+    empty_response = Mock()
+    empty_response.status_code = 200
+    empty_response.json.return_value = []
+
+    mock_session.get.side_effect = [
+        pr_response,
+        empty_response,
+        empty_response,
+        empty_response,
+    ]
+
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        result = main.main()
+
+    assert result == 0
+    mock_load.assert_called_once()
+
+    # Verify transformed data structure
+    call_args = mock_load.call_args[0]
+    transformed_data = call_args[2]
+    assert "pull_requests" in transformed_data
+    assert len(transformed_data["pull_requests"]) == 1
+
+@patch("main.setup_logging")
+@patch("main.load_data")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_bug_id_extraction_through_pipeline(mock_session_class, mock_bq_client, mock_load, mock_setup_logging):
+    """Test bug ID extraction through full pipeline."""
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
+
+    pr_response = Mock()
+    pr_response.status_code = 200
+    pr_response.json.return_value = [
+        {
+            "number": 1,
+            "title": "Bug 9876543 - Fix critical issue",
+            "state": "closed",
+        }
+    ]
+    pr_response.links = {}
+
+    empty_response = Mock()
+    empty_response.status_code = 200
+    empty_response.json.return_value = []
+
+    mock_session.get.side_effect = [
+        pr_response,
+        empty_response,
+        empty_response,
+        empty_response,
+    ]
+
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        main.main()
+
+    call_args = mock_load.call_args[0]
+    transformed_data = call_args[2]
+    pr = transformed_data["pull_requests"][0]
+    assert pr["bug_id"] == 9876543
+
+@patch("main.setup_logging")
+@patch("main.load_data")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_pagination_through_full_flow(mock_session_class, mock_bq_client, mock_load, mock_setup_logging):
+    """Test pagination through full ETL flow."""
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
+
+    # First page
+    pr_response_1 = Mock()
+    pr_response_1.status_code = 200
+    pr_response_1.json.return_value = [
+        {"number": 1, "title": "PR 1", "state": "open"}
+    ]
+    pr_response_1.links = {
+        "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
+    }
+
+    # Second page
+    pr_response_2 = Mock()
+    pr_response_2.status_code = 200
+    pr_response_2.json.return_value = [
+        {"number": 2, "title": "PR 2", "state": "open"}
+    ]
+    pr_response_2.links = {}
+
+    empty_response = Mock()
+    empty_response.status_code = 200
+    empty_response.json.return_value = []
+
+    mock_session.get.side_effect = [
+        pr_response_1,
+        empty_response,
+        empty_response,
+        empty_response,
+        pr_response_2,
+        empty_response,
+        empty_response,
+        empty_response,
+    ]
+
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        main.main()
 
-        # Should be called twice (once per chunk/page)
-        assert mock_load.call_count == 2
+    # Should be called twice (once per chunk/page)
+    assert mock_load.call_count == 2

From 4bb878e5ac26c9583ffc98b3990fe938eb19494f Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Wed, 4 Feb 2026 18:11:38 -0500
Subject: [PATCH 08/11] Added conftest.py and moved tests to test/ directory

---
 tests/conftest.py                  | 284 +++++++++++++++++++++++++++++
 test_main.py => tests/test_main.py | 264 +++++++++++++--------------
 2 files changed, 408 insertions(+), 140 deletions(-)
 create mode 100644 tests/conftest.py
 rename test_main.py => tests/test_main.py (93%)

diff --git a/tests/conftest.py b/tests/conftest.py
new file mode 100644
index 0000000..0656e29
--- /dev/null
+++ b/tests/conftest.py
@@ -0,0 +1,284 @@
+"""
+Pytest fixtures for GitHub ETL tests.
+
+This module provides reusable test fixtures for mocking external dependencies
+and providing sample data for unit and integration tests.
+"""
+
+from datetime import datetime, timezone
+from typing import Any
+from unittest.mock import MagicMock, Mock
+
+import pytest
+import requests
+from google.cloud import bigquery
+
+
+@pytest.fixture
+def mock_env_vars(monkeypatch) -> dict[str, str]:
+    """
+    Set up common environment variables for tests.
+
+    Returns:
+        Dictionary of environment variables that were set
+    """
+    env_vars = {
+        "GITHUB_TOKEN": "test_token_123",
+        "GITHUB_REPOS": "mozilla/firefox",
+        "BIGQUERY_PROJECT": "test-project",
+        "BIGQUERY_DATASET": "test_dataset",
+    }
+    for key, value in env_vars.items():
+        monkeypatch.setenv(key, value)
+    return env_vars
+
+
+@pytest.fixture
+def sample_github_pr() -> dict[str, Any]:
+    """
+    Sample GitHub pull request data from API response.
+
+    Returns:
+        Dictionary representing a single PR from GitHub API
+    """
+    return {
+        "number": 12345,
+        "state": "closed",
+        "title": "Bug 1234567 - Fix memory leak in parser",
+        "created_at": "2025-01-01T10:00:00Z",
+        "updated_at": "2025-01-02T15:30:00Z",
+        "merged_at": "2025-01-02T15:30:00Z",
+        "labels": [
+            {"name": "bug"},
+            {"name": "priority-high"},
+        ],
+        "user": {
+            "login": "test_user",
+            "id": 123,
+        },
+        "head": {
+            "ref": "feature-branch",
+            "sha": "abc123",
+        },
+        "base": {
+            "ref": "main",
+            "sha": "def456",
+        },
+        "commit_data": [],
+        "reviewer_data": [],
+        "comment_data": [],
+    }
+
+
+@pytest.fixture
+def sample_github_commit() -> dict[str, Any]:
+    """
+    Sample GitHub commit data from API response.
+
+    Returns:
+        Dictionary representing a single commit from GitHub API
+    """
+    return {
+        "sha": "abc123def456",
+        "commit": {
+            "author": {
+                "name": "Test Author",
+                "email": "author@example.com",
+                "date": "2025-01-01T10:00:00Z",
+            },
+            "message": "Fix bug in parser",
+        },
+        "files": [
+            {
+                "filename": "src/parser.py",
+                "additions": 10,
+                "deletions": 5,
+                "changes": 15,
+            }
+        ],
+    }
+
+
+@pytest.fixture
+def sample_github_reviewer() -> dict[str, Any]:
+    """
+    Sample GitHub review data from API response.
+
+    Returns:
+        Dictionary representing a single review from GitHub API
+    """
+    return {
+        "id": 98765,
+        "user": {
+            "login": "reviewer_user",
+            "id": 456,
+        },
+        "state": "APPROVED",
+        "submitted_at": "2025-01-02T12:00:00Z",
+        "body": "LGTM",
+    }
+
+
+@pytest.fixture
+def sample_github_comment() -> dict[str, Any]:
+    """
+    Sample GitHub comment data from API response.
+
+    Returns:
+        Dictionary representing a single comment from GitHub API
+    """
+    return {
+        "id": 111222,
+        "user": {
+            "login": "commenter_user",
+            "id": 789,
+        },
+        "created_at": "2025-01-01T14:00:00Z",
+        "body": "Please check the edge case for null values",
+        "pull_request_review_id": None,
+    }
+
+
+@pytest.fixture
+def sample_transformed_data() -> dict[str, list[dict]]:
+    """
+    Sample transformed data ready for BigQuery insertion.
+
+    Returns:
+        Dictionary with keys for each table and transformed row data
+    """
+    return {
+        "pull_requests": [
+            {
+                "pull_request_id": 12345,
+                "current_status": "closed",
+                "date_created": "2025-01-01T10:00:00Z",
+                "date_modified": "2025-01-02T15:30:00Z",
+                "target_repository": "mozilla/firefox",
+                "bug_id": 1234567,
+                "date_landed": "2025-01-02T15:30:00Z",
+                "date_approved": "2025-01-02T12:00:00Z",
+                "labels": ["bug", "priority-high"],
+            }
+        ],
+        "commits": [
+            {
+                "pull_request_id": 12345,
+                "target_repository": "mozilla/firefox",
+                "commit_sha": "abc123def456",
+                "date_created": "2025-01-01T10:00:00Z",
+                "author_username": "Test Author",
+                "author_email": None,
+                "filename": "src/parser.py",
+                "lines_removed": 5,
+                "lines_added": 10,
+            }
+        ],
+        "reviewers": [
+            {
+                "pull_request_id": 12345,
+                "target_repository": "mozilla/firefox",
+                "date_reviewed": "2025-01-02T12:00:00Z",
+                "reviewer_email": None,
+                "reviewer_username": "reviewer_user",
+                "status": "APPROVED",
+            }
+        ],
+        "comments": [
+            {
+                "pull_request_id": 12345,
+                "target_repository": "mozilla/firefox",
+                "comment_id": 111222,
+                "date_created": "2025-01-01T14:00:00Z",
+                "author_email": None,
+                "author_username": "commenter_user",
+                "character_count": 43,
+                "status": None,
+            }
+        ],
+    }
+
+
+@pytest.fixture
+def mock_session() -> Mock:
+    """
+    Mock requests.Session with configurable responses.
+
+    Returns:
+        Mock session object with get() method
+    """
+    session = Mock(spec=requests.Session)
+    session.headers = {}
+    return session
+
+
+@pytest.fixture
+def mock_github_response() -> Mock:
+    """
+    Mock requests.Response for GitHub API calls.
+
+    Returns:
+        Mock response with status_code, json(), headers, and links
+    """
+    response = Mock(spec=requests.Response)
+    response.status_code = 200
+    response.headers = {
+        "X-RateLimit-Remaining": "5000",
+        "X-RateLimit-Reset": "1609459200",
+    }
+    response.links = {}
+    response.text = ""
+    return response
+
+
+@pytest.fixture
+def mock_rate_limited_response() -> Mock:
+    """
+    Mock requests.Response simulating rate limit exceeded.
+
+    Returns:
+        Mock response with 403 status and rate limit headers
+    """
+    response = Mock(spec=requests.Response)
+    response.status_code = 403
+    response.headers = {
+        "X-RateLimit-Remaining": "0",
+        "X-RateLimit-Reset": str(int(datetime.now(timezone.utc).timestamp()) + 3600),
+    }
+    response.text = "API rate limit exceeded"
+    return response
+
+
+@pytest.fixture
+def mock_bigquery_client() -> Mock:
+    """
+    Mock BigQuery client for testing load operations.
+
+    Returns:
+        Mock BigQuery client with insert_rows_json() method
+    """
+    client = Mock(spec=bigquery.Client)
+    client.project = "test-project"
+    client.insert_rows_json = MagicMock(return_value=[])  # Empty list = no errors
+    return client
+
+
+@pytest.fixture
+def mock_bigquery_client_with_errors() -> Mock:
+    """
+    Mock BigQuery client that returns insertion errors.
+
+    Returns:
+        Mock BigQuery client that simulates insert failures
+    """
+    client = Mock(spec=bigquery.Client)
+    client.project = "test-project"
+    client.insert_rows_json = MagicMock(
+        return_value=[
+            {
+                "index": 0,
+                "errors": [{"reason": "invalid", "message": "Invalid schema"}],
+            }
+        ]
+    )
+    return client
diff --git a/test_main.py b/tests/test_main.py
similarity index 93%
rename from test_main.py
rename to tests/test_main.py
index 0d38ac3..19ba7a4 100644
--- a/test_main.py
+++ b/tests/test_main.py
@@ -11,112 +11,9 @@
 from unittest.mock import MagicMock, Mock, patch
 
 import pytest
-import requests
-from google.cloud import bigquery
 
 import main
 
-# =============================================================================
-# FIXTURES
-# =============================================================================
-
-
-@pytest.fixture
-def mock_session():
-    """Provide a mocked requests.Session for testing."""
-    session = Mock(spec=requests.Session)
-    session.headers = {}
-    return session
-
-
-@pytest.fixture
-def mock_bigquery_client():
-    """Provide a mocked BigQuery client for testing."""
-    client = Mock(spec=bigquery.Client)
-    client.project = "test-project"
-    client.insert_rows_json = Mock(return_value=[])
-    return client
-
-
-@pytest.fixture
-def mock_pr_response():
-    """Provide a realistic pull request response for testing."""
-    return {
-        "number": 123,
-        "title": "Bug 1234567 - Fix login issue",
-        "state": "closed",
-        "created_at": "2024-01-01T10:00:00Z",
-        "updated_at": "2024-01-02T10:00:00Z",
-        "merged_at": "2024-01-02T10:00:00Z",
-        "user": {"login": "testuser"},
-        "head": {"ref": "fix-branch"},
-        "base": {"ref": "main"},
-        "labels": [{"name": "bug"}, {"name": "priority-high"}],
-        "commit_data": [],
-        "reviewer_data": [],
-        "comment_data": [],
-    }
-
-
-@pytest.fixture
-def mock_commit_response():
-    """Provide a realistic commit response with files."""
-    return {
-        "sha": "abc123def456",
-        "commit": {
-            "author": {
-                "name": "Test Author",
-                "email": "test@example.com",
-                "date": "2024-01-01T12:00:00Z",
-            }
-        },
-        "files": [
-            {
-                "filename": "src/login.py",
-                "additions": 10,
-                "deletions": 5,
-                "changes": 15,
-            },
-            {
-                "filename": "tests/test_login.py",
-                "additions": 20,
-                "deletions": 2,
-                "changes": 22,
-            },
-        ],
-    }
-
-
-@pytest.fixture
-def mock_reviewer_response():
-    """Provide a realistic reviewer response."""
-    return {
-        "id": 789,
-        "user": {"login": "reviewer1"},
-        "state": "APPROVED",
-        "submitted_at": "2024-01-01T15:00:00Z",
-        "body": "LGTM",
-    }
-
-
-@pytest.fixture
-def mock_comment_response():
-    """Provide a realistic comment response."""
-    return {
-        "id": 456,
-        "user": {"login": "commenter1"},
-        "created_at": "2024-01-01T14:00:00Z",
-        "body": "This looks good to me",
-        "pull_request_review_id": None,
-    }
-
-
-# =============================================================================
-# TEST CLASSES
-# =============================================================================
-
-
-
 # =============================================================================
 # TESTS FOR SETUP_LOGGING
 # =============================================================================
@@ -132,13 +29,11 @@ def test_setup_logging():
 
     # Check that at least one handler is a StreamHandler
     has_stream_handler = any(
-        isinstance(handler, logging.StreamHandler)
-        for handler in root_logger.handlers
+        isinstance(handler, logging.StreamHandler) for handler in root_logger.handlers
     )
     assert has_stream_handler
 
 
-
 # =============================================================================
 # TESTS FOR SLEEP_FOR_RATE_LIMIT
 # =============================================================================
@@ -206,7 +101,6 @@ def test_sleep_for_rate_limit_with_missing_headers(mock_sleep):
     mock_sleep.assert_not_called()
 
 
-
 # =============================================================================
 # TESTS FOR EXTRACT_PULL_REQUESTS
 # =============================================================================
@@ -237,6 +131,7 @@ def test_extract_pull_requests_basic(mock_session):
     assert result[0][0]["number"] == 1
     assert result[0][1]["number"] == 2
 
+
 def test_extract_multiple_pages(mock_session):
     """Test extracting data across multiple pages with pagination."""
     # First page response
@@ -271,6 +166,7 @@ def test_extract_multiple_pages(mock_session):
     assert result[0][0]["number"] == 1
     assert result[1][0]["number"] == 3
 
+
 def test_enriches_prs_with_commit_data(mock_session):
     """Test that PRs are enriched with commit data."""
     mock_response = Mock()
@@ -294,6 +190,7 @@ def test_enriches_prs_with_commit_data(mock_session):
     assert result[0][0]["commit_data"] == mock_commits
     mock_extract_commits.assert_called_once()
 
+
 def test_enriches_prs_with_reviewer_data(mock_session):
     """Test that PRs are enriched with reviewer data."""
     mock_response = Mock()
@@ -317,6 +214,7 @@ def test_enriches_prs_with_reviewer_data(mock_session):
     assert result[0][0]["reviewer_data"] == mock_reviewers
     mock_extract_reviewers.assert_called_once()
 
+
 def test_enriches_prs_with_comment_data(mock_session):
     """Test that PRs are enriched with comment data."""
     mock_response = Mock()
@@ -340,6 +238,7 @@ def test_enriches_prs_with_comment_data(mock_session):
     assert result[0][0]["comment_data"] == mock_comments
     mock_extract_comments.assert_called_once()
 
+
 @patch("main.sleep_for_rate_limit")
 def test_handles_rate_limit(mock_sleep, mock_session):
     """Test that extract_pull_requests handles rate limiting correctly."""
@@ -369,6 +268,7 @@ def test_handles_rate_limit(mock_sleep, mock_session):
     mock_sleep.assert_called_once_with(mock_response_rate_limit)
     assert len(result) == 1
 
+
 def test_handles_api_error_404(mock_session):
     """Test that extract_pull_requests raises SystemExit on 404."""
     mock_response = Mock()
@@ -382,6 +282,7 @@ def test_handles_api_error_404(mock_session):
 
     assert "GitHub API error 404" in str(exc_info.value)
 
+
 def test_handles_api_error_500(mock_session):
     """Test that extract_pull_requests raises SystemExit on 500."""
     mock_response = Mock()
@@ -395,6 +296,7 @@ def test_handles_api_error_500(mock_session):
 
     assert "GitHub API error 500" in str(exc_info.value)
 
+
 def test_stops_on_empty_batch(mock_session):
     """Test that extraction stops when an empty batch is returned."""
     # First page with data
@@ -424,6 +326,7 @@ def test_stops_on_empty_batch(mock_session):
     assert len(result) == 1
     assert len(result[0]) == 1
 
+
 def test_invalid_page_number_handling(mock_session):
     """Test handling of invalid page number in pagination."""
     mock_response_1 = Mock()
@@ -447,6 +350,7 @@ def test_invalid_page_number_handling(mock_session):
     # Should stop pagination on invalid page number
     assert len(result) == 1
 
+
 def test_custom_github_api_url(mock_session):
     """Test using custom GitHub API URL."""
     custom_url = "https://mock-github.example.com"
@@ -473,6 +377,7 @@ def test_custom_github_api_url(mock_session):
     call_args = mock_session.get.call_args
     assert custom_url in call_args[0][0]
 
+
 def test_skips_prs_without_number_field(mock_session):
     """Test that PRs without 'number' field are skipped."""
     mock_response = Mock()
@@ -497,11 +402,13 @@ def test_skips_prs_without_number_field(mock_session):
     assert mock_commits.call_count == 2
 
 
-
 # =============================================================================
 # TESTS FOR EXTRACT_COMMITS
 # =============================================================================
 
+
+def test_extract_commits_with_files(mock_session):
+    """Test extracting commits with file details."""
     # Mock commits list response
     commits_response = Mock()
     commits_response.status_code = 200
@@ -539,6 +446,7 @@ def test_skips_prs_without_number_field(mock_session):
     assert result[1]["sha"] == "def456"
     assert result[1]["files"][0]["filename"] == "file2.py"
 
+
 def test_multiple_files_per_commit(mock_session):
     """Test handling multiple files in a single commit."""
     commits_response = Mock()
@@ -563,6 +471,7 @@ def test_multiple_files_per_commit(mock_session):
     assert len(result) == 1
     assert len(result[0]["files"]) == 3
 
+
 @patch("main.sleep_for_rate_limit")
 def test_rate_limit_on_commits_list(mock_sleep, mock_session):
     """Test rate limit handling when fetching commits list."""
@@ -583,6 +492,7 @@ def test_rate_limit_on_commits_list(mock_sleep, mock_session):
     mock_sleep.assert_called_once()
     assert result == []
 
+
 def test_api_error_on_commits_list(mock_session):
     """Test API error handling when fetching commits list."""
     error_response = Mock()
@@ -596,6 +506,7 @@ def test_api_error_on_commits_list(mock_session):
 
     assert "GitHub API error 500" in str(exc_info.value)
 
+
 def test_api_error_on_individual_commit(mock_session):
     """Test API error when fetching individual commit details."""
     commits_response = Mock()
@@ -613,6 +524,7 @@ def test_api_error_on_individual_commit(mock_session):
 
     assert "GitHub API error 404" in str(exc_info.value)
 
+
 def test_commit_without_sha_field(mock_session):
     """Test handling commits without sha field."""
     commits_response = Mock()
@@ -641,7 +553,8 @@ def test_commit_without_sha_field(mock_session):
     # Should handle the commit without sha gracefully
     assert len(result) == 2
 
-def test_custom_github_api_url(mock_session):
+
+def test_custom_github_api_url_commits(mock_session):
     """Test using custom GitHub API URL for commits."""
     custom_url = "https://mock-github.example.com"
 
@@ -658,6 +571,7 @@ def test_custom_github_api_url(mock_session):
     call_args = mock_session.get.call_args
     assert custom_url in call_args[0][0]
 
+
 def test_empty_commits_list(mock_session):
     """Test handling PR with no commits."""
     commits_response = Mock()
@@ -671,11 +585,13 @@ def test_empty_commits_list(mock_session):
     assert result == []
 
 
-
 # =============================================================================
 # TESTS FOR EXTRACT_REVIEWERS
 # =============================================================================
 
+
+def test_extract_reviewers_basic(mock_session):
+    """Test basic extraction of reviewers."""
     reviewers_response = Mock()
     reviewers_response.status_code = 200
     reviewers_response.json.return_value = [
@@ -701,6 +617,7 @@ def test_empty_commits_list(mock_session):
     assert result[0]["state"] == "APPROVED"
     assert result[1]["state"] == "CHANGES_REQUESTED"
 
+
 def test_multiple_review_states(mock_session):
     """Test handling multiple different review states."""
     reviewers_response = Mock()
@@ -722,6 +639,7 @@ def test_multiple_review_states(mock_session):
     assert "CHANGES_REQUESTED" in states
     assert "COMMENTED" in states
 
+
 def test_empty_reviewers_list(mock_session):
     """Test handling PR with no reviewers."""
     reviewers_response = Mock()
@@ -734,6 +652,7 @@ def test_empty_reviewers_list(mock_session):
 
     assert result == []
 
+
 @patch("main.sleep_for_rate_limit")
 def test_rate_limit_handling(mock_sleep, mock_session):
     """Test rate limit handling when fetching reviewers."""
@@ -752,6 +671,7 @@ def test_rate_limit_handling(mock_sleep, mock_session):
     mock_sleep.assert_called_once()
     assert result == []
 
+
 def test_api_error(mock_session):
     """Test API error handling when fetching reviewers."""
     error_response = Mock()
@@ -765,7 +685,8 @@ def test_api_error(mock_session):
 
     assert "GitHub API error 500" in str(exc_info.value)
 
-def test_custom_github_api_url(mock_session):
+
+def test_custom_github_api_url_reviewers(mock_session):
     """Test using custom GitHub API URL for reviewers."""
     custom_url = "https://mock-github.example.com"
 
@@ -783,11 +704,13 @@ def test_custom_github_api_url(mock_session):
     assert custom_url in call_args[0][0]
 
 
-
 # =============================================================================
 # TESTS FOR EXTRACT_COMMENTS
 # =============================================================================
 
+
+def test_extract_comments_basic(mock_session):
+    """Test basic extraction of comments."""
     comments_response = Mock()
     comments_response.status_code = 200
     comments_response.json.return_value = [
@@ -813,6 +736,7 @@ def test_custom_github_api_url(mock_session):
     assert result[0]["id"] == 456
     assert result[1]["id"] == 457
 
+
 def test_uses_issues_endpoint(mock_session):
     """Test that comments use /issues endpoint not /pulls."""
     comments_response = Mock()
@@ -828,6 +752,7 @@ def test_uses_issues_endpoint(mock_session):
     assert "/issues/123/comments" in url
     assert "/pulls/123/comments" not in url
 
+
 def test_multiple_comments(mock_session):
     """Test handling multiple comments."""
     comments_response = Mock()
@@ -843,6 +768,7 @@ def test_multiple_comments(mock_session):
 
     assert len(result) == 10
 
+
 def test_empty_comments_list(mock_session):
     """Test handling PR with no comments."""
     comments_response = Mock()
@@ -855,8 +781,9 @@ def test_empty_comments_list(mock_session):
 
     assert result == []
 
+
 @patch("main.sleep_for_rate_limit")
-def test_rate_limit_handling(mock_sleep, mock_session):
+def test_rate_limit_handling_comments(mock_sleep, mock_session):
     """Test rate limit handling when fetching comments."""
     rate_limit_response = Mock()
     rate_limit_response.status_code = 403
@@ -873,7 +800,8 @@ def test_rate_limit_handling(mock_sleep, mock_session):
     mock_sleep.assert_called_once()
     assert result == []
 
-def test_api_error(mock_session):
+
+def test_api_error_comments(mock_session):
     """Test API error handling when fetching comments."""
     error_response = Mock()
     error_response.status_code = 404
@@ -886,7 +814,8 @@ def test_api_error(mock_session):
 
     assert "GitHub API error 404" in str(exc_info.value)
 
-def test_custom_github_api_url(mock_session):
+
+def test_custom_github_api_url_comments(mock_session):
     """Test using custom GitHub API URL for comments."""
     custom_url = "https://mock-github.example.com"
 
@@ -904,11 +833,13 @@ def test_custom_github_api_url(mock_session):
     assert custom_url in call_args[0][0]
 
 
-
 # =============================================================================
 # TESTS FOR TRANSFORM_DATA
 # =============================================================================
 
+
+def test_transform_data_basic():
+    """Test basic transformation of pull request data."""
     raw_data = [
         {
             "number": 123,
@@ -935,6 +866,7 @@ def test_custom_github_api_url(mock_session):
     assert pr["date_landed"] == "2024-01-02T12:00:00Z"
     assert pr["target_repository"] == "mozilla/firefox"
 
+
 def test_bug_id_extraction_basic():
     """Test bug ID extraction from PR title."""
     test_cases = [
@@ -960,6 +892,7 @@ def test_bug_id_extraction_basic():
         result = main.transform_data(raw_data, "mozilla/firefox")
         assert result["pull_requests"][0]["bug_id"] == expected_bug_id
 
+
 def test_bug_id_extraction_with_hash():
     """Test bug ID extraction with # symbol."""
     raw_data = [
@@ -977,6 +910,7 @@ def test_bug_id_extraction_with_hash():
     result = main.transform_data(raw_data, "mozilla/firefox")
     assert result["pull_requests"][0]["bug_id"] == 1234567
 
+
 def test_bug_id_filter_large_numbers():
     """Test that bug IDs >= 100000000 are filtered out."""
     raw_data = [
@@ -994,6 +928,7 @@ def test_bug_id_filter_large_numbers():
     result = main.transform_data(raw_data, "mozilla/firefox")
     assert result["pull_requests"][0]["bug_id"] is None
 
+
 def test_bug_id_no_match():
     """Test PR title with no bug ID."""
     raw_data = [
@@ -1011,6 +946,7 @@ def test_bug_id_no_match():
     result = main.transform_data(raw_data, "mozilla/firefox")
     assert result["pull_requests"][0]["bug_id"] is None
 
+
 def test_labels_extraction():
     """Test labels array extraction."""
     raw_data = [
@@ -1036,6 +972,7 @@ def test_labels_extraction():
     assert "priority-high" in labels
     assert "needs-review" in labels
 
+
 def test_labels_empty_list():
     """Test handling empty labels list."""
     raw_data = [
@@ -1053,6 +990,7 @@ def test_labels_empty_list():
     result = main.transform_data(raw_data, "mozilla/firefox")
     assert result["pull_requests"][0]["labels"] == []
 
+
 def test_commit_transformation():
     """Test commit fields mapping."""
     raw_data = [
@@ -1097,6 +1035,7 @@ def test_commit_transformation():
     assert commit["lines_added"] == 10
     assert commit["lines_removed"] == 5
 
+
 def test_commit_file_flattening():
     """Test that each file becomes a separate row."""
     raw_data = [
@@ -1130,6 +1069,7 @@ def test_commit_file_flattening():
     assert "file2.py" in filenames
     assert "file3.py" in filenames
 
+
 def test_multiple_commits_with_files():
     """Test multiple commits with multiple files per PR."""
     raw_data = [
@@ -1168,6 +1108,7 @@ def test_multiple_commits_with_files():
     assert result["commits"][1]["commit_sha"] == "commit2"
     assert result["commits"][2]["commit_sha"] == "commit2"
 
+
 def test_reviewer_transformation():
     """Test reviewer fields mapping."""
     raw_data = [
@@ -1199,8 +1140,9 @@ def test_reviewer_transformation():
     assert reviewer["status"] == "APPROVED"
     assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z"
 
-def test_multiple_review_states():
-    """Test handling multiple review states."""
+
+def test_transform_multiple_review_states():
+    """Test transforming data with multiple review states."""
     raw_data = [
         {
             "number": 123,
@@ -1240,6 +1182,7 @@ def test_multiple_review_states():
     assert "CHANGES_REQUESTED" in states
     assert "COMMENTED" in states
 
+
 def test_date_approved_from_earliest_approval():
     """Test that date_approved is set to earliest APPROVED review."""
     raw_data = [
@@ -1278,6 +1221,7 @@ def test_date_approved_from_earliest_approval():
     pr = result["pull_requests"][0]
     assert pr["date_approved"] == "2024-01-01T14:00:00Z"
 
+
 def test_comment_transformation():
     """Test comment fields mapping."""
     raw_data = [
@@ -1311,6 +1255,7 @@ def test_comment_transformation():
     assert comment["date_created"] == "2024-01-01T14:00:00Z"
     assert comment["character_count"] == 17
 
+
 def test_comment_character_count():
     """Test character count calculation for comments."""
     raw_data = [
@@ -1343,6 +1288,7 @@ def test_comment_character_count():
     assert result["comments"][0]["character_count"] == 5
     assert result["comments"][1]["character_count"] == 44
 
+
 def test_comment_status_from_review():
     """Test that comment status is mapped from review_id_statuses."""
     raw_data = [
@@ -1377,6 +1323,7 @@ def test_comment_status_from_review():
     # Comment should have status from the review
     assert result["comments"][0]["status"] == "APPROVED"
 
+
 def test_comment_empty_body():
     """Test handling comments with empty or None body."""
     raw_data = [
@@ -1409,6 +1356,7 @@ def test_comment_empty_body():
     assert result["comments"][0]["character_count"] == 0
     assert result["comments"][1]["character_count"] == 0
 
+
 def test_empty_raw_data():
     """Test handling empty input list."""
     result = main.transform_data([], "mozilla/firefox")
@@ -1418,6 +1366,7 @@ def test_empty_raw_data():
     assert result["reviewers"] == []
     assert result["comments"] == []
 
+
 def test_pr_without_commits_reviewers_comments():
     """Test PR with no commits, reviewers, or comments."""
     raw_data = [
@@ -1439,6 +1388,7 @@ def test_pr_without_commits_reviewers_comments():
     assert len(result["reviewers"]) == 0
     assert len(result["comments"]) == 0
 
+
 def test_return_structure():
     """Test that transform_data returns dict with 4 keys."""
     raw_data = [
@@ -1461,6 +1411,7 @@ def test_return_structure():
     assert "reviewers" in result
     assert "comments" in result
 
+
 def test_all_tables_have_target_repository():
     """Test that all tables include target_repository field."""
     raw_data = [
@@ -1473,9 +1424,7 @@ def test_all_tables_have_target_repository():
                 {
                     "sha": "abc",
                     "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
-                    "files": [
-                        {"filename": "test.py", "additions": 1, "deletions": 0}
-                    ],
+                    "files": [{"filename": "test.py", "additions": 1, "deletions": 0}],
                 }
             ],
             "reviewer_data": [
@@ -1505,7 +1454,6 @@ def test_all_tables_have_target_repository():
     assert result["comments"][0]["target_repository"] == "mozilla/firefox"
 
 
-
 # =============================================================================
 # TESTS FOR LOAD_DATA
 # =============================================================================
@@ -1528,6 +1476,7 @@ def test_load_data_inserts_all_tables(mock_datetime, mock_bigquery_client):
     # Should call insert_rows_json 4 times (once per table)
     assert mock_bigquery_client.insert_rows_json.call_count == 4
 
+
 @patch("main.datetime")
 def test_adds_snapshot_date(mock_datetime, mock_bigquery_client):
     """Test that snapshot_date is added to all rows."""
@@ -1546,6 +1495,7 @@ def test_adds_snapshot_date(mock_datetime, mock_bigquery_client):
     rows = call_args[0][1]
     assert all(row["snapshot_date"] == "2024-01-15" for row in rows)
 
+
 def test_constructs_correct_table_ref(mock_bigquery_client):
     """Test that table_ref is constructed correctly."""
     transformed_data = {
@@ -1561,6 +1511,7 @@ def test_constructs_correct_table_ref(mock_bigquery_client):
     table_ref = call_args[0][0]
     assert table_ref == "test-project.my_dataset.pull_requests"
 
+
 def test_empty_transformed_data_skipped(mock_bigquery_client):
     """Test that empty transformed_data dict is skipped."""
     transformed_data = {}
@@ -1569,6 +1520,7 @@ def test_empty_transformed_data_skipped(mock_bigquery_client):
 
     mock_bigquery_client.insert_rows_json.assert_not_called()
 
+
 def test_skips_empty_tables_individually(mock_bigquery_client):
     """Test that empty tables are skipped individually."""
     transformed_data = {
@@ -1583,6 +1535,7 @@ def test_skips_empty_tables_individually(mock_bigquery_client):
     # Should only call insert_rows_json twice (for PRs and comments)
     assert mock_bigquery_client.insert_rows_json.call_count == 2
 
+
 def test_only_pull_requests_table(mock_bigquery_client):
     """Test loading only pull_requests table."""
     transformed_data = {
@@ -1596,6 +1549,7 @@ def test_only_pull_requests_table(mock_bigquery_client):
 
     assert mock_bigquery_client.insert_rows_json.call_count == 1
 
+
 def test_raises_exception_on_insert_errors(mock_bigquery_client):
     """Test that Exception is raised on BigQuery insert errors."""
     mock_bigquery_client.insert_rows_json.return_value = [
@@ -1614,6 +1568,7 @@ def test_raises_exception_on_insert_errors(mock_bigquery_client):
 
     assert "BigQuery insert errors" in str(exc_info.value)
 
+
 def test_verifies_client_insert_called_correctly(mock_bigquery_client):
     """Test that client.insert_rows_json is called with correct arguments."""
     transformed_data = {
@@ -1632,7 +1587,6 @@ def test_verifies_client_insert_called_correctly(mock_bigquery_client):
     assert len(rows) == 2
 
 
-
 # =============================================================================
 # TESTS FOR MAIN
 # =============================================================================
@@ -1657,7 +1611,9 @@ def test_requires_github_repos(mock_session_class, mock_bq_client, mock_setup_lo
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_requires_bigquery_project(mock_session_class, mock_bq_client, mock_setup_logging):
+def test_requires_bigquery_project(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
     """Test that BIGQUERY_PROJECT is required."""
     with patch.dict(
         os.environ,
@@ -1673,7 +1629,9 @@ def test_requires_bigquery_project(mock_session_class, mock_bq_client, mock_setu
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_requires_bigquery_dataset(mock_session_class, mock_bq_client, mock_setup_logging):
+def test_requires_bigquery_dataset(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
     """Test that BIGQUERY_DATASET is required."""
     with patch.dict(
         os.environ,
@@ -1685,10 +1643,13 @@ def test_requires_bigquery_dataset(mock_session_class, mock_bq_client, mock_setu
 
         assert "BIGQUERY_DATASET" in str(exc_info.value)
 
+
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_github_token_optional_with_warning(mock_session_class, mock_bq_client, mock_setup_logging):
+def test_github_token_optional_with_warning(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
     """Test that GITHUB_TOKEN is optional but warns if missing."""
     with (
         patch.dict(
@@ -1706,10 +1667,13 @@ def test_github_token_optional_with_warning(mock_session_class, mock_bq_client,
         result = main.main()
         assert result == 0
 
+
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_splits_github_repos_by_comma(mock_session_class, mock_bq_client, mock_setup_logging):
+def test_splits_github_repos_by_comma(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
     """Test that GITHUB_REPOS is split by comma."""
     with (
         patch.dict(
@@ -1729,6 +1693,7 @@ def test_splits_github_repos_by_comma(mock_session_class, mock_bq_client, mock_s
         # Should be called twice (once per repo)
         assert mock_extract.call_count == 2
 
+
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
@@ -1753,10 +1718,13 @@ def test_honors_github_api_url(mock_session_class, mock_bq_client, mock_setup_lo
         call_kwargs = mock_extract.call_args[1]
         assert call_kwargs["github_api_url"] == "https://custom-api.example.com"
 
+
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_honors_bigquery_emulator_host(mock_session_class, mock_bq_client_class, mock_setup_logging):
+def test_honors_bigquery_emulator_host(
+    mock_session_class, mock_bq_client_class, mock_setup_logging
+):
     """Test that BIGQUERY_EMULATOR_HOST is honored."""
     with (
         patch.dict(
@@ -1777,10 +1745,13 @@ def test_honors_bigquery_emulator_host(mock_session_class, mock_bq_client_class,
         # Verify BigQuery client was created with emulator settings
         mock_bq_client_class.assert_called_once()
 
+
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_creates_session_with_headers(mock_session_class, mock_bq_client, mock_setup_logging):
+def test_creates_session_with_headers(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
     """Test that session is created with Accept and User-Agent headers."""
     mock_session = MagicMock()
     mock_session_class.return_value = mock_session
@@ -1806,10 +1777,13 @@ def test_creates_session_with_headers(mock_session_class, mock_bq_client, mock_s
         assert "Accept" in call_args
         assert "User-Agent" in call_args
 
+
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_sets_authorization_header_with_token(mock_session_class, mock_bq_client, mock_setup_logging):
+def test_sets_authorization_header_with_token(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
     """Test that Authorization header is set when token provided."""
     mock_session = MagicMock()
     mock_session_class.return_value = mock_session
@@ -1832,6 +1806,7 @@ def test_sets_authorization_header_with_token(mock_session_class, mock_bq_client
         # Verify Authorization header was set
         assert mock_session.headers.__setitem__.called
 
+
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
@@ -1872,6 +1847,7 @@ def test_single_repo_successful_etl(
     mock_transform.assert_called_once()
     mock_load.assert_called_once()
 
+
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
@@ -1911,6 +1887,7 @@ def test_multiple_repos_processing(
     # Should process 3 repositories
     assert mock_extract.call_count == 3
 
+
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
@@ -1958,10 +1935,13 @@ def test_processes_chunks_iteratively(
     assert mock_transform.call_count == 3
     assert mock_load.call_count == 3
 
+
 @patch("main.setup_logging")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_returns_zero_on_success(mock_session_class, mock_bq_client, mock_setup_logging):
+def test_returns_zero_on_success(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
     """Test that main returns 0 on success."""
     with (
         patch.dict(
@@ -1986,7 +1966,9 @@ def test_returns_zero_on_success(mock_session_class, mock_bq_client, mock_setup_
 @patch("main.load_data")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_full_etl_flow_transforms_data_correctly(mock_session_class, mock_bq_client, mock_load, mock_setup_logging):
+def test_full_etl_flow_transforms_data_correctly(
+    mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+):
     """Test full ETL flow with mocked GitHub responses."""
     mock_session = MagicMock()
     mock_session_class.return_value = mock_session
@@ -2032,11 +2014,14 @@ def test_full_etl_flow_transforms_data_correctly(mock_session_class, mock_bq_cli
     assert "pull_requests" in transformed_data
     assert len(transformed_data["pull_requests"]) == 1
 
+
 @patch("main.setup_logging")
 @patch("main.load_data")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_bug_id_extraction_through_pipeline(mock_session_class, mock_bq_client, mock_load, mock_setup_logging):
+def test_bug_id_extraction_through_pipeline(
+    mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+):
     """Test bug ID extraction through full pipeline."""
     mock_session = MagicMock()
     mock_session_class.return_value = mock_session
@@ -2080,11 +2065,14 @@ def test_bug_id_extraction_through_pipeline(mock_session_class, mock_bq_client,
     pr = transformed_data["pull_requests"][0]
     assert pr["bug_id"] == 9876543
 
+
 @patch("main.setup_logging")
 @patch("main.load_data")
 @patch("main.bigquery.Client")
 @patch("requests.Session")
-def test_pagination_through_full_flow(mock_session_class, mock_bq_client, mock_load, mock_setup_logging):
+def test_pagination_through_full_flow(
+    mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+):
     """Test pagination through full ETL flow."""
     mock_session = MagicMock()
     mock_session_class.return_value = mock_session
@@ -2092,9 +2080,7 @@ def test_pagination_through_full_flow(mock_session_class, mock_bq_client, mock_l
     # First page
     pr_response_1 = Mock()
     pr_response_1.status_code = 200
-    pr_response_1.json.return_value = [
-        {"number": 1, "title": "PR 1", "state": "open"}
-    ]
+    pr_response_1.json.return_value = [{"number": 1, "title": "PR 1", "state": "open"}]
     pr_response_1.links = {
         "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
     }
@@ -2102,9 +2088,7 @@ def test_pagination_through_full_flow(mock_session_class, mock_bq_client, mock_l
     # Second page
     pr_response_2 = Mock()
     pr_response_2.status_code = 200
-    pr_response_2.json.return_value = [
-        {"number": 2, "title": "PR 2", "state": "open"}
-    ]
+    pr_response_2.json.return_value = [{"number": 2, "title": "PR 2", "state": "open"}]
     pr_response_2.links = {}
 
     empty_response = Mock()

From e3647c4ae2a9f7e28fd5050f9d02df2d26b79c90 Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Fri, 6 Feb 2026 16:33:25 -0500
Subject: [PATCH 09/11] - Fixed integration test gitub action to use docker
 compose properly. - Broke up all of the tests into individual files based on
 function to   make for easier review.

---
 .github/workflows/tests.yml         |    6 +-
 tests/test_extract_comments.py      |  137 ++
 tests/test_extract_commits.py       |  190 +++
 tests/test_extract_pull_requests.py |  309 ++++
 tests/test_extract_reviewers.py     |  127 ++
 tests/test_load_data.py             |  141 ++
 tests/test_logging.py               |   25 +
 tests/test_main.py                  | 2122 ---------------------------
 tests/test_main_integration.py      |  544 +++++++
 tests/test_rate_limit.py            |   72 +
 tests/test_transform_data.py        |  625 ++++++++
 11 files changed, 2173 insertions(+), 2125 deletions(-)
 create mode 100644 tests/test_extract_comments.py
 create mode 100644 tests/test_extract_commits.py
 create mode 100644 tests/test_extract_pull_requests.py
 create mode 100644 tests/test_extract_reviewers.py
 create mode 100644 tests/test_load_data.py
 create mode 100644 tests/test_logging.py
 delete mode 100644 tests/test_main.py
 create mode 100644 tests/test_main_integration.py
 create mode 100644 tests/test_rate_limit.py
 create mode 100644 tests/test_transform_data.py

diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
index c7b9d39..b4cc85b 100644
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -24,9 +24,9 @@ jobs:
     runs-on: ubuntu-latest
     steps:
     - uses: actions/checkout@v4
-    - name: Run integration test with docker-compose
+    - name: Run integration test with docker compose
       run: |
-        docker-compose up --build --abort-on-container-exit --exit-code-from github-etl
+        docker compose up --build --abort-on-container-exit --exit-code-from github-etl
     - name: Cleanup
       if: always()
-      run: docker-compose down -v
+      run: docker compose down -v
diff --git a/tests/test_extract_comments.py b/tests/test_extract_comments.py
new file mode 100644
index 0000000..25232b3
--- /dev/null
+++ b/tests/test_extract_comments.py
@@ -0,0 +1,137 @@
+#!/usr/bin/env python3
+"""
+Tests for extract_comments function.
+
+Tests comment extraction including endpoint verification, rate limiting,
+and error handling.
+"""
+
+from unittest.mock import Mock, patch
+
+import pytest
+
+import main
+
+
+def test_extract_comments_basic(mock_session):
+    """Test basic extraction of comments."""
+    comments_response = Mock()
+    comments_response.status_code = 200
+    comments_response.json.return_value = [
+        {
+            "id": 456,
+            "user": {"login": "commenter1"},
+            "body": "This looks good",
+            "created_at": "2024-01-01T14:00:00Z",
+        },
+        {
+            "id": 457,
+            "user": {"login": "commenter2"},
+            "body": "I have concerns",
+            "created_at": "2024-01-01T15:00:00Z",
+        },
+    ]
+
+    mock_session.get.return_value = comments_response
+
+    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+    assert len(result) == 2
+    assert result[0]["id"] == 456
+    assert result[1]["id"] == 457
+
+
+def test_uses_issues_endpoint(mock_session):
+    """Test that comments use /issues endpoint not /pulls."""
+    comments_response = Mock()
+    comments_response.status_code = 200
+    comments_response.json.return_value = []
+
+    mock_session.get.return_value = comments_response
+
+    main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+    call_args = mock_session.get.call_args
+    url = call_args[0][0]
+    assert "/issues/123/comments" in url
+    assert "/pulls/123/comments" not in url
+
+
+def test_multiple_comments(mock_session):
+    """Test handling multiple comments."""
+    comments_response = Mock()
+    comments_response.status_code = 200
+    comments_response.json.return_value = [
+        {"id": i, "user": {"login": f"user{i}"}, "body": f"Comment {i}"}
+        for i in range(1, 11)
+    ]
+
+    mock_session.get.return_value = comments_response
+
+    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+    assert len(result) == 10
+
+
+def test_empty_comments_list(mock_session):
+    """Test handling PR with no comments."""
+    comments_response = Mock()
+    comments_response.status_code = 200
+    comments_response.json.return_value = []
+
+    mock_session.get.return_value = comments_response
+
+    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+    assert result == []
+
+
+@patch("main.sleep_for_rate_limit")
+def test_rate_limit_handling_comments(mock_sleep, mock_session):
+    """Test rate limit handling when fetching comments."""
+    rate_limit_response = Mock()
+    rate_limit_response.status_code = 403
+    rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
+
+    success_response = Mock()
+    success_response.status_code = 200
+    success_response.json.return_value = []
+
+    mock_session.get.side_effect = [rate_limit_response, success_response]
+
+    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+    mock_sleep.assert_called_once()
+    assert result == []
+
+
+def test_api_error_comments(mock_session):
+    """Test API error handling when fetching comments."""
+    error_response = Mock()
+    error_response.status_code = 404
+    error_response.text = "Not Found"
+
+    mock_session.get.return_value = error_response
+
+    with pytest.raises(SystemExit) as exc_info:
+        main.extract_comments(mock_session, "mozilla/firefox", 123)
+
+    assert "GitHub API error 404" in str(exc_info.value)
+
+
+def test_custom_github_api_url_comments(mock_session):
+    """Test using custom GitHub API URL for comments."""
+    custom_url = "https://mock-github.example.com"
+
+    comments_response = Mock()
+    comments_response.status_code = 200
+    comments_response.json.return_value = []
+
+    mock_session.get.return_value = comments_response
+
+    main.extract_comments(
+        mock_session, "mozilla/firefox", 123, github_api_url=custom_url
+    )
+
+    call_args = mock_session.get.call_args
+    assert custom_url in call_args[0][0]
diff --git a/tests/test_extract_commits.py b/tests/test_extract_commits.py
new file mode 100644
index 0000000..bccc8b5
--- /dev/null
+++ b/tests/test_extract_commits.py
@@ -0,0 +1,190 @@
+#!/usr/bin/env python3
+"""
+Tests for extract_commits function.
+
+Tests commit extraction including file details, rate limiting, and error handling.
+"""
+
+from unittest.mock import Mock, patch
+
+import pytest
+
+import main
+
+
+def test_extract_commits_with_files(mock_session):
+    """Test extracting commits with file details."""
+    # Mock commits list response
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = [
+        {"sha": "abc123"},
+        {"sha": "def456"},
+    ]
+
+    # Mock individual commit responses
+    commit_detail_1 = Mock()
+    commit_detail_1.status_code = 200
+    commit_detail_1.json.return_value = {
+        "sha": "abc123",
+        "files": [{"filename": "file1.py", "additions": 10}],
+    }
+
+    commit_detail_2 = Mock()
+    commit_detail_2.status_code = 200
+    commit_detail_2.json.return_value = {
+        "sha": "def456",
+        "files": [{"filename": "file2.py", "deletions": 5}],
+    }
+
+    mock_session.get.side_effect = [
+        commits_response,
+        commit_detail_1,
+        commit_detail_2,
+    ]
+
+    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+    assert len(result) == 2
+    assert result[0]["sha"] == "abc123"
+    assert result[0]["files"][0]["filename"] == "file1.py"
+    assert result[1]["sha"] == "def456"
+    assert result[1]["files"][0]["filename"] == "file2.py"
+
+
+def test_multiple_files_per_commit(mock_session):
+    """Test handling multiple files in a single commit."""
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = [{"sha": "abc123"}]
+
+    commit_detail = Mock()
+    commit_detail.status_code = 200
+    commit_detail.json.return_value = {
+        "sha": "abc123",
+        "files": [
+            {"filename": "file1.py", "additions": 10},
+            {"filename": "file2.py", "additions": 20},
+            {"filename": "file3.py", "deletions": 5},
+        ],
+    }
+
+    mock_session.get.side_effect = [commits_response, commit_detail]
+
+    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+    assert len(result) == 1
+    assert len(result[0]["files"]) == 3
+
+
+@patch("main.sleep_for_rate_limit")
+def test_rate_limit_on_commits_list(mock_sleep, mock_session):
+    """Test rate limit handling when fetching commits list."""
+    # Rate limit response
+    rate_limit_response = Mock()
+    rate_limit_response.status_code = 403
+    rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
+
+    # Success response
+    success_response = Mock()
+    success_response.status_code = 200
+    success_response.json.return_value = []
+
+    mock_session.get.side_effect = [rate_limit_response, success_response]
+
+    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+    mock_sleep.assert_called_once()
+    assert result == []
+
+
+def test_api_error_on_commits_list(mock_session):
+    """Test API error handling when fetching commits list."""
+    error_response = Mock()
+    error_response.status_code = 500
+    error_response.text = "Internal Server Error"
+
+    mock_session.get.return_value = error_response
+
+    with pytest.raises(SystemExit) as exc_info:
+        main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+    assert "GitHub API error 500" in str(exc_info.value)
+
+
+def test_api_error_on_individual_commit(mock_session):
+    """Test API error when fetching individual commit details."""
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = [{"sha": "abc123"}]
+
+    commit_error = Mock()
+    commit_error.status_code = 404
+    commit_error.text = "Commit not found"
+
+    mock_session.get.side_effect = [commits_response, commit_error]
+
+    with pytest.raises(SystemExit) as exc_info:
+        main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+    assert "GitHub API error 404" in str(exc_info.value)
+
+
+def test_commit_without_sha_field(mock_session):
+    """Test handling commits without sha field."""
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = [
+        {"sha": "abc123"},
+        {},  # Missing sha field
+    ]
+
+    commit_detail_1 = Mock()
+    commit_detail_1.status_code = 200
+    commit_detail_1.json.return_value = {"sha": "abc123", "files": []}
+
+    commit_detail_2 = Mock()
+    commit_detail_2.status_code = 200
+    commit_detail_2.json.return_value = {"files": []}
+
+    mock_session.get.side_effect = [
+        commits_response,
+        commit_detail_1,
+        commit_detail_2,
+    ]
+
+    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+    # Should handle the commit without sha gracefully
+    assert len(result) == 2
+
+
+def test_custom_github_api_url_commits(mock_session):
+    """Test using custom GitHub API URL for commits."""
+    custom_url = "https://mock-github.example.com"
+
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = []
+
+    mock_session.get.return_value = commits_response
+
+    main.extract_commits(
+        mock_session, "mozilla/firefox", 123, github_api_url=custom_url
+    )
+
+    call_args = mock_session.get.call_args
+    assert custom_url in call_args[0][0]
+
+
+def test_empty_commits_list(mock_session):
+    """Test handling PR with no commits."""
+    commits_response = Mock()
+    commits_response.status_code = 200
+    commits_response.json.return_value = []
+
+    mock_session.get.return_value = commits_response
+
+    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
+
+    assert result == []
diff --git a/tests/test_extract_pull_requests.py b/tests/test_extract_pull_requests.py
new file mode 100644
index 0000000..b6325fb
--- /dev/null
+++ b/tests/test_extract_pull_requests.py
@@ -0,0 +1,309 @@
+#!/usr/bin/env python3
+"""
+Tests for extract_pull_requests function.
+
+Tests pull request extraction including pagination, rate limiting, error handling,
+and enrichment with commits, reviewers, and comments.
+"""
+
+from unittest.mock import Mock, patch
+
+import pytest
+
+import main
+
+
+def test_extract_pull_requests_basic(mock_session):
+    """Test basic extraction of pull requests."""
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [
+        {"number": 1, "title": "PR 1"},
+        {"number": 2, "title": "PR 2"},
+    ]
+    mock_response.links = {}
+
+    mock_session.get.return_value = mock_response
+
+    # Mock the extract functions
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    assert len(result) == 1
+    assert len(result[0]) == 2
+    assert result[0][0]["number"] == 1
+    assert result[0][1]["number"] == 2
+
+
+def test_extract_multiple_pages(mock_session):
+    """Test extracting data across multiple pages with pagination."""
+    # First page response
+    mock_response_1 = Mock()
+    mock_response_1.status_code = 200
+    mock_response_1.json.return_value = [
+        {"number": 1, "title": "PR 1"},
+        {"number": 2, "title": "PR 2"},
+    ]
+    mock_response_1.links = {
+        "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
+    }
+
+    # Second page response
+    mock_response_2 = Mock()
+    mock_response_2.status_code = 200
+    mock_response_2.json.return_value = [{"number": 3, "title": "PR 3"}]
+    mock_response_2.links = {}
+
+    mock_session.get.side_effect = [mock_response_1, mock_response_2]
+
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    assert len(result) == 2
+    assert len(result[0]) == 2
+    assert len(result[1]) == 1
+    assert result[0][0]["number"] == 1
+    assert result[1][0]["number"] == 3
+
+
+def test_enriches_prs_with_commit_data(mock_session):
+    """Test that PRs are enriched with commit data."""
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
+    mock_response.links = {}
+
+    mock_session.get.return_value = mock_response
+
+    mock_commits = [{"sha": "abc123"}]
+
+    with (
+        patch(
+            "main.extract_commits", return_value=mock_commits
+        ) as mock_extract_commits,
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    assert result[0][0]["commit_data"] == mock_commits
+    mock_extract_commits.assert_called_once()
+
+
+def test_enriches_prs_with_reviewer_data(mock_session):
+    """Test that PRs are enriched with reviewer data."""
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
+    mock_response.links = {}
+
+    mock_session.get.return_value = mock_response
+
+    mock_reviewers = [{"id": 789, "state": "APPROVED"}]
+
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch(
+            "main.extract_reviewers", return_value=mock_reviewers
+        ) as mock_extract_reviewers,
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    assert result[0][0]["reviewer_data"] == mock_reviewers
+    mock_extract_reviewers.assert_called_once()
+
+
+def test_enriches_prs_with_comment_data(mock_session):
+    """Test that PRs are enriched with comment data."""
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
+    mock_response.links = {}
+
+    mock_session.get.return_value = mock_response
+
+    mock_comments = [{"id": 456, "body": "Great work!"}]
+
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch(
+            "main.extract_comments", return_value=mock_comments
+        ) as mock_extract_comments,
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    assert result[0][0]["comment_data"] == mock_comments
+    mock_extract_comments.assert_called_once()
+
+
+@patch("main.sleep_for_rate_limit")
+def test_handles_rate_limit(mock_sleep, mock_session):
+    """Test that extract_pull_requests handles rate limiting correctly."""
+    # Rate limit response
+    mock_response_rate_limit = Mock()
+    mock_response_rate_limit.status_code = 403
+    mock_response_rate_limit.headers = {"X-RateLimit-Remaining": "0"}
+
+    # Successful response after rate limit
+    mock_response_success = Mock()
+    mock_response_success.status_code = 200
+    mock_response_success.json.return_value = [{"number": 1, "title": "PR 1"}]
+    mock_response_success.links = {}
+
+    mock_session.get.side_effect = [
+        mock_response_rate_limit,
+        mock_response_success,
+    ]
+
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    mock_sleep.assert_called_once_with(mock_response_rate_limit)
+    assert len(result) == 1
+
+
+def test_handles_api_error_404(mock_session):
+    """Test that extract_pull_requests raises SystemExit on 404."""
+    mock_response = Mock()
+    mock_response.status_code = 404
+    mock_response.text = "Not Found"
+
+    mock_session.get.return_value = mock_response
+
+    with pytest.raises(SystemExit) as exc_info:
+        list(main.extract_pull_requests(mock_session, "mozilla/nonexistent"))
+
+    assert "GitHub API error 404" in str(exc_info.value)
+
+
+def test_handles_api_error_500(mock_session):
+    """Test that extract_pull_requests raises SystemExit on 500."""
+    mock_response = Mock()
+    mock_response.status_code = 500
+    mock_response.text = "Internal Server Error"
+
+    mock_session.get.return_value = mock_response
+
+    with pytest.raises(SystemExit) as exc_info:
+        list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    assert "GitHub API error 500" in str(exc_info.value)
+
+
+def test_stops_on_empty_batch(mock_session):
+    """Test that extraction stops when an empty batch is returned."""
+    # First page with data
+    mock_response_1 = Mock()
+    mock_response_1.status_code = 200
+    mock_response_1.json.return_value = [{"number": 1}]
+    mock_response_1.links = {
+        "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
+    }
+
+    # Second page empty
+    mock_response_2 = Mock()
+    mock_response_2.status_code = 200
+    mock_response_2.json.return_value = []
+    mock_response_2.links = {}
+
+    mock_session.get.side_effect = [mock_response_1, mock_response_2]
+
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    # Should only have 1 chunk from first page
+    assert len(result) == 1
+    assert len(result[0]) == 1
+
+
+def test_invalid_page_number_handling(mock_session):
+    """Test handling of invalid page number in pagination."""
+    mock_response_1 = Mock()
+    mock_response_1.status_code = 200
+    mock_response_1.json.return_value = [{"number": 1}]
+    mock_response_1.links = {
+        "next": {
+            "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=invalid"
+        }
+    }
+
+    mock_session.get.return_value = mock_response_1
+
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    # Should stop pagination on invalid page number
+    assert len(result) == 1
+
+
+def test_custom_github_api_url(mock_session):
+    """Test using custom GitHub API URL."""
+    custom_url = "https://mock-github.example.com"
+
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [{"number": 1}]
+    mock_response.links = {}
+
+    mock_session.get.return_value = mock_response
+
+    with (
+        patch("main.extract_commits", return_value=[]),
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        list(
+            main.extract_pull_requests(
+                mock_session, "mozilla/firefox", github_api_url=custom_url
+            )
+        )
+
+    # Verify custom URL was used
+    call_args = mock_session.get.call_args
+    assert custom_url in call_args[0][0]
+
+
+def test_skips_prs_without_number_field(mock_session):
+    """Test that PRs without 'number' field are skipped."""
+    mock_response = Mock()
+    mock_response.status_code = 200
+    mock_response.json.return_value = [
+        {"number": 1, "title": "PR 1"},
+        {"title": "PR without number"},  # Missing number field
+        {"number": 2, "title": "PR 2"},
+    ]
+    mock_response.links = {}
+
+    mock_session.get.return_value = mock_response
+
+    with (
+        patch("main.extract_commits", return_value=[]) as mock_commits,
+        patch("main.extract_reviewers", return_value=[]),
+        patch("main.extract_comments", return_value=[]),
+    ):
+        list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
+
+    # extract_commits should only be called for PRs with number field
+    assert mock_commits.call_count == 2
diff --git a/tests/test_extract_reviewers.py b/tests/test_extract_reviewers.py
new file mode 100644
index 0000000..7df4b43
--- /dev/null
+++ b/tests/test_extract_reviewers.py
@@ -0,0 +1,127 @@
+#!/usr/bin/env python3
+"""
+Tests for extract_reviewers function.
+
+Tests reviewer extraction including different review states, rate limiting,
+and error handling.
+"""
+
+from unittest.mock import Mock, patch
+
+import pytest
+
+import main
+
+
+def test_extract_reviewers_basic(mock_session):
+    """Test basic extraction of reviewers."""
+    reviewers_response = Mock()
+    reviewers_response.status_code = 200
+    reviewers_response.json.return_value = [
+        {
+            "id": 789,
+            "user": {"login": "reviewer1"},
+            "state": "APPROVED",
+            "submitted_at": "2024-01-01T15:00:00Z",
+        },
+        {
+            "id": 790,
+            "user": {"login": "reviewer2"},
+            "state": "CHANGES_REQUESTED",
+            "submitted_at": "2024-01-01T16:00:00Z",
+        },
+    ]
+
+    mock_session.get.return_value = reviewers_response
+
+    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+
+    assert len(result) == 2
+    assert result[0]["state"] == "APPROVED"
+    assert result[1]["state"] == "CHANGES_REQUESTED"
+
+
+def test_multiple_review_states(mock_session):
+    """Test handling multiple different review states."""
+    reviewers_response = Mock()
+    reviewers_response.status_code = 200
+    reviewers_response.json.return_value = [
+        {"id": 1, "state": "APPROVED", "user": {"login": "user1"}},
+        {"id": 2, "state": "CHANGES_REQUESTED", "user": {"login": "user2"}},
+        {"id": 3, "state": "COMMENTED", "user": {"login": "user3"}},
+        {"id": 4, "state": "DISMISSED", "user": {"login": "user4"}},
+    ]
+
+    mock_session.get.return_value = reviewers_response
+
+    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+
+    assert len(result) == 4
+    states = [r["state"] for r in result]
+    assert "APPROVED" in states
+    assert "CHANGES_REQUESTED" in states
+    assert "COMMENTED" in states
+
+
+def test_empty_reviewers_list(mock_session):
+    """Test handling PR with no reviewers."""
+    reviewers_response = Mock()
+    reviewers_response.status_code = 200
+    reviewers_response.json.return_value = []
+
+    mock_session.get.return_value = reviewers_response
+
+    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+
+    assert result == []
+
+
+@patch("main.sleep_for_rate_limit")
+def test_rate_limit_handling(mock_sleep, mock_session):
+    """Test rate limit handling when fetching reviewers."""
+    rate_limit_response = Mock()
+    rate_limit_response.status_code = 403
+    rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
+
+    success_response = Mock()
+    success_response.status_code = 200
+    success_response.json.return_value = []
+
+    mock_session.get.side_effect = [rate_limit_response, success_response]
+
+    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+
+    mock_sleep.assert_called_once()
+    assert result == []
+
+
+def test_api_error(mock_session):
+    """Test API error handling when fetching reviewers."""
+    error_response = Mock()
+    error_response.status_code = 500
+    error_response.text = "Internal Server Error"
+
+    mock_session.get.return_value = error_response
+
+    with pytest.raises(SystemExit) as exc_info:
+        main.extract_reviewers(mock_session, "mozilla/firefox", 123)
+
+    assert "GitHub API error 500" in str(exc_info.value)
+
+
+def test_custom_github_api_url_reviewers(mock_session):
+    """Test using custom GitHub API URL for reviewers."""
+    custom_url = "https://mock-github.example.com"
+
+    reviewers_response = Mock()
+    reviewers_response.status_code = 200
+    reviewers_response.json.return_value = []
+
+    mock_session.get.return_value = reviewers_response
+
+    main.extract_reviewers(
+        mock_session, "mozilla/firefox", 123, github_api_url=custom_url
+    )
+
+    call_args = mock_session.get.call_args
+    assert custom_url in call_args[0][0]
diff --git a/tests/test_load_data.py b/tests/test_load_data.py
new file mode 100644
index 0000000..0203288
--- /dev/null
+++ b/tests/test_load_data.py
@@ -0,0 +1,141 @@
+#!/usr/bin/env python3
+"""
+Tests for load_data function.
+
+Tests BigQuery data loading including table insertion, snapshot dates,
+and error handling.
+"""
+
+from unittest.mock import patch
+
+import pytest
+
+import main
+
+
+@patch("main.datetime")
+def test_load_data_inserts_all_tables(mock_datetime, mock_bigquery_client):
+    """Test that load_data inserts all tables correctly."""
+    mock_datetime.now.return_value.strftime.return_value = "2024-01-15"
+
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [{"commit_sha": "abc"}],
+        "reviewers": [{"reviewer_username": "user1"}],
+        "comments": [{"comment_id": 123}],
+    }
+
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+    # Should call insert_rows_json 4 times (once per table)
+    assert mock_bigquery_client.insert_rows_json.call_count == 4
+
+
+@patch("main.datetime")
+def test_adds_snapshot_date(mock_datetime, mock_bigquery_client):
+    """Test that snapshot_date is added to all rows."""
+    mock_datetime.now.return_value.strftime.return_value = "2024-01-15"
+
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
+
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+    call_args = mock_bigquery_client.insert_rows_json.call_args
+    rows = call_args[0][1]
+    assert all(row["snapshot_date"] == "2024-01-15" for row in rows)
+
+
+def test_constructs_correct_table_ref(mock_bigquery_client):
+    """Test that table_ref is constructed correctly."""
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
+
+    main.load_data(mock_bigquery_client, "my_dataset", transformed_data)
+
+    call_args = mock_bigquery_client.insert_rows_json.call_args
+    table_ref = call_args[0][0]
+    assert table_ref == "test-project.my_dataset.pull_requests"
+
+
+def test_empty_transformed_data_skipped(mock_bigquery_client):
+    """Test that empty transformed_data dict is skipped."""
+    transformed_data = {}
+
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+    mock_bigquery_client.insert_rows_json.assert_not_called()
+
+
+def test_skips_empty_tables_individually(mock_bigquery_client):
+    """Test that empty tables are skipped individually."""
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],  # Empty, should be skipped
+        "reviewers": [],  # Empty, should be skipped
+        "comments": [{"comment_id": 456}],
+    }
+
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+    # Should only call insert_rows_json twice (for PRs and comments)
+    assert mock_bigquery_client.insert_rows_json.call_count == 2
+
+
+def test_only_pull_requests_table(mock_bigquery_client):
+    """Test loading only pull_requests table."""
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
+
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+    assert mock_bigquery_client.insert_rows_json.call_count == 1
+
+
+def test_raises_exception_on_insert_errors(mock_bigquery_client):
+    """Test that Exception is raised on BigQuery insert errors."""
+    mock_bigquery_client.insert_rows_json.return_value = [
+        {"index": 0, "errors": ["Insert failed"]}
+    ]
+
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
+
+    with pytest.raises(Exception) as exc_info:
+        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+    assert "BigQuery insert errors" in str(exc_info.value)
+
+
+def test_verifies_client_insert_called_correctly(mock_bigquery_client):
+    """Test that client.insert_rows_json is called with correct arguments."""
+    transformed_data = {
+        "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
+
+    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
+
+    call_args = mock_bigquery_client.insert_rows_json.call_args
+    table_ref, rows = call_args[0]
+
+    assert "pull_requests" in table_ref
+    assert len(rows) == 2
diff --git a/tests/test_logging.py b/tests/test_logging.py
new file mode 100644
index 0000000..10730d1
--- /dev/null
+++ b/tests/test_logging.py
@@ -0,0 +1,25 @@
+#!/usr/bin/env python3
+"""
+Tests for setup_logging function.
+
+Tests logging configuration including log level and handler setup.
+"""
+
+import logging
+
+import main
+
+
+def test_setup_logging():
+    """Test that setup_logging configures logging correctly."""
+    main.setup_logging()
+
+    root_logger = logging.getLogger()
+    assert root_logger.level == logging.INFO
+    assert len(root_logger.handlers) > 0
+
+    # Check that at least one handler is a StreamHandler
+    has_stream_handler = any(
+        isinstance(handler, logging.StreamHandler) for handler in root_logger.handlers
+    )
+    assert has_stream_handler
diff --git a/tests/test_main.py b/tests/test_main.py
deleted file mode 100644
index 19ba7a4..0000000
--- a/tests/test_main.py
+++ /dev/null
@@ -1,2122 +0,0 @@
-#!/usr/bin/env python3
-"""
-Comprehensive test suite for GitHub ETL main.py
-
-This test suite provides complete coverage for all functions in main.py,
-including extraction, transformation, loading, and orchestration logic.
-"""
-
-import logging
-import os
-from unittest.mock import MagicMock, Mock, patch
-
-import pytest
-
-import main
-
-# =============================================================================
-# TESTS FOR SETUP_LOGGING
-# =============================================================================
-
-
-def test_setup_logging():
-    """Test that setup_logging configures logging correctly."""
-    main.setup_logging()
-
-    root_logger = logging.getLogger()
-    assert root_logger.level == logging.INFO
-    assert len(root_logger.handlers) > 0
-
-    # Check that at least one handler is a StreamHandler
-    has_stream_handler = any(
-        isinstance(handler, logging.StreamHandler) for handler in root_logger.handlers
-    )
-    assert has_stream_handler
-
-
-# =============================================================================
-# TESTS FOR SLEEP_FOR_RATE_LIMIT
-# =============================================================================
-
-
-@patch("time.time")
-@patch("time.sleep")
-def test_sleep_for_rate_limit_calculates_wait_time(mock_sleep, mock_time):
-    """Test that sleep_for_rate_limit calculates correct wait time."""
-    mock_time.return_value = 1000
-
-    mock_response = Mock()
-    mock_response.headers = {
-        "X-RateLimit-Remaining": "0",
-        "X-RateLimit-Reset": "1120",  # 120 seconds from now
-    }
-
-    main.sleep_for_rate_limit(mock_response)
-
-    mock_sleep.assert_called_once_with(120)
-
-
-@patch("time.time")
-@patch("time.sleep")
-def test_sleep_for_rate_limit_when_reset_already_passed(mock_sleep, mock_time):
-    """Test that sleep_for_rate_limit doesn't sleep negative time."""
-    mock_time.return_value = 2000
-
-    mock_response = Mock()
-    mock_response.headers = {
-        "X-RateLimit-Remaining": "0",
-        "X-RateLimit-Reset": "1500",  # Already passed
-    }
-
-    main.sleep_for_rate_limit(mock_response)
-
-    # Should sleep for 0 seconds (max of 0 and negative value)
-    mock_sleep.assert_called_once_with(0)
-
-
-@patch("time.sleep")
-def test_sleep_for_rate_limit_when_remaining_not_zero(mock_sleep):
-    """Test that sleep_for_rate_limit doesn't sleep when remaining > 0."""
-    mock_response = Mock()
-    mock_response.headers = {
-        "X-RateLimit-Remaining": "5",
-        "X-RateLimit-Reset": "1500",
-    }
-
-    main.sleep_for_rate_limit(mock_response)
-
-    # Should not sleep when remaining > 0
-    mock_sleep.assert_not_called()
-
-
-@patch("time.sleep")
-def test_sleep_for_rate_limit_with_missing_headers(mock_sleep):
-    """Test sleep_for_rate_limit with missing rate limit headers."""
-    mock_response = Mock()
-    mock_response.headers = {}
-
-    main.sleep_for_rate_limit(mock_response)
-
-    # Should not sleep when headers are missing (defaults to remaining=1)
-    mock_sleep.assert_not_called()
-
-
-# =============================================================================
-# TESTS FOR EXTRACT_PULL_REQUESTS
-# =============================================================================
-
-
-def test_extract_pull_requests_basic(mock_session):
-    """Test basic extraction of pull requests."""
-    mock_response = Mock()
-    mock_response.status_code = 200
-    mock_response.json.return_value = [
-        {"number": 1, "title": "PR 1"},
-        {"number": 2, "title": "PR 2"},
-    ]
-    mock_response.links = {}
-
-    mock_session.get.return_value = mock_response
-
-    # Mock the extract functions
-    with (
-        patch("main.extract_commits", return_value=[]),
-        patch("main.extract_reviewers", return_value=[]),
-        patch("main.extract_comments", return_value=[]),
-    ):
-        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-    assert len(result) == 1
-    assert len(result[0]) == 2
-    assert result[0][0]["number"] == 1
-    assert result[0][1]["number"] == 2
-
-
-def test_extract_multiple_pages(mock_session):
-    """Test extracting data across multiple pages with pagination."""
-    # First page response
-    mock_response_1 = Mock()
-    mock_response_1.status_code = 200
-    mock_response_1.json.return_value = [
-        {"number": 1, "title": "PR 1"},
-        {"number": 2, "title": "PR 2"},
-    ]
-    mock_response_1.links = {
-        "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
-    }
-
-    # Second page response
-    mock_response_2 = Mock()
-    mock_response_2.status_code = 200
-    mock_response_2.json.return_value = [{"number": 3, "title": "PR 3"}]
-    mock_response_2.links = {}
-
-    mock_session.get.side_effect = [mock_response_1, mock_response_2]
-
-    with (
-        patch("main.extract_commits", return_value=[]),
-        patch("main.extract_reviewers", return_value=[]),
-        patch("main.extract_comments", return_value=[]),
-    ):
-        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-    assert len(result) == 2
-    assert len(result[0]) == 2
-    assert len(result[1]) == 1
-    assert result[0][0]["number"] == 1
-    assert result[1][0]["number"] == 3
-
-
-def test_enriches_prs_with_commit_data(mock_session):
-    """Test that PRs are enriched with commit data."""
-    mock_response = Mock()
-    mock_response.status_code = 200
-    mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
-    mock_response.links = {}
-
-    mock_session.get.return_value = mock_response
-
-    mock_commits = [{"sha": "abc123"}]
-
-    with (
-        patch(
-            "main.extract_commits", return_value=mock_commits
-        ) as mock_extract_commits,
-        patch("main.extract_reviewers", return_value=[]),
-        patch("main.extract_comments", return_value=[]),
-    ):
-        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-    assert result[0][0]["commit_data"] == mock_commits
-    mock_extract_commits.assert_called_once()
-
-
-def test_enriches_prs_with_reviewer_data(mock_session):
-    """Test that PRs are enriched with reviewer data."""
-    mock_response = Mock()
-    mock_response.status_code = 200
-    mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
-    mock_response.links = {}
-
-    mock_session.get.return_value = mock_response
-
-    mock_reviewers = [{"id": 789, "state": "APPROVED"}]
-
-    with (
-        patch("main.extract_commits", return_value=[]),
-        patch(
-            "main.extract_reviewers", return_value=mock_reviewers
-        ) as mock_extract_reviewers,
-        patch("main.extract_comments", return_value=[]),
-    ):
-        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-    assert result[0][0]["reviewer_data"] == mock_reviewers
-    mock_extract_reviewers.assert_called_once()
-
-
-def test_enriches_prs_with_comment_data(mock_session):
-    """Test that PRs are enriched with comment data."""
-    mock_response = Mock()
-    mock_response.status_code = 200
-    mock_response.json.return_value = [{"number": 1, "title": "PR 1"}]
-    mock_response.links = {}
-
-    mock_session.get.return_value = mock_response
-
-    mock_comments = [{"id": 456, "body": "Great work!"}]
-
-    with (
-        patch("main.extract_commits", return_value=[]),
-        patch("main.extract_reviewers", return_value=[]),
-        patch(
-            "main.extract_comments", return_value=mock_comments
-        ) as mock_extract_comments,
-    ):
-        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-    assert result[0][0]["comment_data"] == mock_comments
-    mock_extract_comments.assert_called_once()
-
-
-@patch("main.sleep_for_rate_limit")
-def test_handles_rate_limit(mock_sleep, mock_session):
-    """Test that extract_pull_requests handles rate limiting correctly."""
-    # Rate limit response
-    mock_response_rate_limit = Mock()
-    mock_response_rate_limit.status_code = 403
-    mock_response_rate_limit.headers = {"X-RateLimit-Remaining": "0"}
-
-    # Successful response after rate limit
-    mock_response_success = Mock()
-    mock_response_success.status_code = 200
-    mock_response_success.json.return_value = [{"number": 1, "title": "PR 1"}]
-    mock_response_success.links = {}
-
-    mock_session.get.side_effect = [
-        mock_response_rate_limit,
-        mock_response_success,
-    ]
-
-    with (
-        patch("main.extract_commits", return_value=[]),
-        patch("main.extract_reviewers", return_value=[]),
-        patch("main.extract_comments", return_value=[]),
-    ):
-        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-    mock_sleep.assert_called_once_with(mock_response_rate_limit)
-    assert len(result) == 1
-
-
-def test_handles_api_error_404(mock_session):
-    """Test that extract_pull_requests raises SystemExit on 404."""
-    mock_response = Mock()
-    mock_response.status_code = 404
-    mock_response.text = "Not Found"
-
-    mock_session.get.return_value = mock_response
-
-    with pytest.raises(SystemExit) as exc_info:
-        list(main.extract_pull_requests(mock_session, "mozilla/nonexistent"))
-
-    assert "GitHub API error 404" in str(exc_info.value)
-
-
-def test_handles_api_error_500(mock_session):
-    """Test that extract_pull_requests raises SystemExit on 500."""
-    mock_response = Mock()
-    mock_response.status_code = 500
-    mock_response.text = "Internal Server Error"
-
-    mock_session.get.return_value = mock_response
-
-    with pytest.raises(SystemExit) as exc_info:
-        list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-    assert "GitHub API error 500" in str(exc_info.value)
-
-
-def test_stops_on_empty_batch(mock_session):
-    """Test that extraction stops when an empty batch is returned."""
-    # First page with data
-    mock_response_1 = Mock()
-    mock_response_1.status_code = 200
-    mock_response_1.json.return_value = [{"number": 1}]
-    mock_response_1.links = {
-        "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
-    }
-
-    # Second page empty
-    mock_response_2 = Mock()
-    mock_response_2.status_code = 200
-    mock_response_2.json.return_value = []
-    mock_response_2.links = {}
-
-    mock_session.get.side_effect = [mock_response_1, mock_response_2]
-
-    with (
-        patch("main.extract_commits", return_value=[]),
-        patch("main.extract_reviewers", return_value=[]),
-        patch("main.extract_comments", return_value=[]),
-    ):
-        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-    # Should only have 1 chunk from first page
-    assert len(result) == 1
-    assert len(result[0]) == 1
-
-
-def test_invalid_page_number_handling(mock_session):
-    """Test handling of invalid page number in pagination."""
-    mock_response_1 = Mock()
-    mock_response_1.status_code = 200
-    mock_response_1.json.return_value = [{"number": 1}]
-    mock_response_1.links = {
-        "next": {
-            "url": "https://api.github.com/repos/mozilla/firefox/pulls?page=invalid"
-        }
-    }
-
-    mock_session.get.return_value = mock_response_1
-
-    with (
-        patch("main.extract_commits", return_value=[]),
-        patch("main.extract_reviewers", return_value=[]),
-        patch("main.extract_comments", return_value=[]),
-    ):
-        result = list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-    # Should stop pagination on invalid page number
-    assert len(result) == 1
-
-
-def test_custom_github_api_url(mock_session):
-    """Test using custom GitHub API URL."""
-    custom_url = "https://mock-github.example.com"
-
-    mock_response = Mock()
-    mock_response.status_code = 200
-    mock_response.json.return_value = [{"number": 1}]
-    mock_response.links = {}
-
-    mock_session.get.return_value = mock_response
-
-    with (
-        patch("main.extract_commits", return_value=[]),
-        patch("main.extract_reviewers", return_value=[]),
-        patch("main.extract_comments", return_value=[]),
-    ):
-        list(
-            main.extract_pull_requests(
-                mock_session, "mozilla/firefox", github_api_url=custom_url
-            )
-        )
-
-    # Verify custom URL was used
-    call_args = mock_session.get.call_args
-    assert custom_url in call_args[0][0]
-
-
-def test_skips_prs_without_number_field(mock_session):
-    """Test that PRs without 'number' field are skipped."""
-    mock_response = Mock()
-    mock_response.status_code = 200
-    mock_response.json.return_value = [
-        {"number": 1, "title": "PR 1"},
-        {"title": "PR without number"},  # Missing number field
-        {"number": 2, "title": "PR 2"},
-    ]
-    mock_response.links = {}
-
-    mock_session.get.return_value = mock_response
-
-    with (
-        patch("main.extract_commits", return_value=[]) as mock_commits,
-        patch("main.extract_reviewers", return_value=[]),
-        patch("main.extract_comments", return_value=[]),
-    ):
-        list(main.extract_pull_requests(mock_session, "mozilla/firefox"))
-
-    # extract_commits should only be called for PRs with number field
-    assert mock_commits.call_count == 2
-
-
-# =============================================================================
-# TESTS FOR EXTRACT_COMMITS
-# =============================================================================
-
-
-def test_extract_commits_with_files(mock_session):
-    """Test extracting commits with file details."""
-    # Mock commits list response
-    commits_response = Mock()
-    commits_response.status_code = 200
-    commits_response.json.return_value = [
-        {"sha": "abc123"},
-        {"sha": "def456"},
-    ]
-
-    # Mock individual commit responses
-    commit_detail_1 = Mock()
-    commit_detail_1.status_code = 200
-    commit_detail_1.json.return_value = {
-        "sha": "abc123",
-        "files": [{"filename": "file1.py", "additions": 10}],
-    }
-
-    commit_detail_2 = Mock()
-    commit_detail_2.status_code = 200
-    commit_detail_2.json.return_value = {
-        "sha": "def456",
-        "files": [{"filename": "file2.py", "deletions": 5}],
-    }
-
-    mock_session.get.side_effect = [
-        commits_response,
-        commit_detail_1,
-        commit_detail_2,
-    ]
-
-    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
-
-    assert len(result) == 2
-    assert result[0]["sha"] == "abc123"
-    assert result[0]["files"][0]["filename"] == "file1.py"
-    assert result[1]["sha"] == "def456"
-    assert result[1]["files"][0]["filename"] == "file2.py"
-
-
-def test_multiple_files_per_commit(mock_session):
-    """Test handling multiple files in a single commit."""
-    commits_response = Mock()
-    commits_response.status_code = 200
-    commits_response.json.return_value = [{"sha": "abc123"}]
-
-    commit_detail = Mock()
-    commit_detail.status_code = 200
-    commit_detail.json.return_value = {
-        "sha": "abc123",
-        "files": [
-            {"filename": "file1.py", "additions": 10},
-            {"filename": "file2.py", "additions": 20},
-            {"filename": "file3.py", "deletions": 5},
-        ],
-    }
-
-    mock_session.get.side_effect = [commits_response, commit_detail]
-
-    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
-
-    assert len(result) == 1
-    assert len(result[0]["files"]) == 3
-
-
-@patch("main.sleep_for_rate_limit")
-def test_rate_limit_on_commits_list(mock_sleep, mock_session):
-    """Test rate limit handling when fetching commits list."""
-    # Rate limit response
-    rate_limit_response = Mock()
-    rate_limit_response.status_code = 403
-    rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
-
-    # Success response
-    success_response = Mock()
-    success_response.status_code = 200
-    success_response.json.return_value = []
-
-    mock_session.get.side_effect = [rate_limit_response, success_response]
-
-    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
-
-    mock_sleep.assert_called_once()
-    assert result == []
-
-
-def test_api_error_on_commits_list(mock_session):
-    """Test API error handling when fetching commits list."""
-    error_response = Mock()
-    error_response.status_code = 500
-    error_response.text = "Internal Server Error"
-
-    mock_session.get.return_value = error_response
-
-    with pytest.raises(SystemExit) as exc_info:
-        main.extract_commits(mock_session, "mozilla/firefox", 123)
-
-    assert "GitHub API error 500" in str(exc_info.value)
-
-
-def test_api_error_on_individual_commit(mock_session):
-    """Test API error when fetching individual commit details."""
-    commits_response = Mock()
-    commits_response.status_code = 200
-    commits_response.json.return_value = [{"sha": "abc123"}]
-
-    commit_error = Mock()
-    commit_error.status_code = 404
-    commit_error.text = "Commit not found"
-
-    mock_session.get.side_effect = [commits_response, commit_error]
-
-    with pytest.raises(SystemExit) as exc_info:
-        main.extract_commits(mock_session, "mozilla/firefox", 123)
-
-    assert "GitHub API error 404" in str(exc_info.value)
-
-
-def test_commit_without_sha_field(mock_session):
-    """Test handling commits without sha field."""
-    commits_response = Mock()
-    commits_response.status_code = 200
-    commits_response.json.return_value = [
-        {"sha": "abc123"},
-        {},  # Missing sha field
-    ]
-
-    commit_detail_1 = Mock()
-    commit_detail_1.status_code = 200
-    commit_detail_1.json.return_value = {"sha": "abc123", "files": []}
-
-    commit_detail_2 = Mock()
-    commit_detail_2.status_code = 200
-    commit_detail_2.json.return_value = {"files": []}
-
-    mock_session.get.side_effect = [
-        commits_response,
-        commit_detail_1,
-        commit_detail_2,
-    ]
-
-    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
-
-    # Should handle the commit without sha gracefully
-    assert len(result) == 2
-
-
-def test_custom_github_api_url_commits(mock_session):
-    """Test using custom GitHub API URL for commits."""
-    custom_url = "https://mock-github.example.com"
-
-    commits_response = Mock()
-    commits_response.status_code = 200
-    commits_response.json.return_value = []
-
-    mock_session.get.return_value = commits_response
-
-    main.extract_commits(
-        mock_session, "mozilla/firefox", 123, github_api_url=custom_url
-    )
-
-    call_args = mock_session.get.call_args
-    assert custom_url in call_args[0][0]
-
-
-def test_empty_commits_list(mock_session):
-    """Test handling PR with no commits."""
-    commits_response = Mock()
-    commits_response.status_code = 200
-    commits_response.json.return_value = []
-
-    mock_session.get.return_value = commits_response
-
-    result = main.extract_commits(mock_session, "mozilla/firefox", 123)
-
-    assert result == []
-
-
-# =============================================================================
-# TESTS FOR EXTRACT_REVIEWERS
-# =============================================================================
-
-
-def test_extract_reviewers_basic(mock_session):
-    """Test basic extraction of reviewers."""
-    reviewers_response = Mock()
-    reviewers_response.status_code = 200
-    reviewers_response.json.return_value = [
-        {
-            "id": 789,
-            "user": {"login": "reviewer1"},
-            "state": "APPROVED",
-            "submitted_at": "2024-01-01T15:00:00Z",
-        },
-        {
-            "id": 790,
-            "user": {"login": "reviewer2"},
-            "state": "CHANGES_REQUESTED",
-            "submitted_at": "2024-01-01T16:00:00Z",
-        },
-    ]
-
-    mock_session.get.return_value = reviewers_response
-
-    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
-
-    assert len(result) == 2
-    assert result[0]["state"] == "APPROVED"
-    assert result[1]["state"] == "CHANGES_REQUESTED"
-
-
-def test_multiple_review_states(mock_session):
-    """Test handling multiple different review states."""
-    reviewers_response = Mock()
-    reviewers_response.status_code = 200
-    reviewers_response.json.return_value = [
-        {"id": 1, "state": "APPROVED", "user": {"login": "user1"}},
-        {"id": 2, "state": "CHANGES_REQUESTED", "user": {"login": "user2"}},
-        {"id": 3, "state": "COMMENTED", "user": {"login": "user3"}},
-        {"id": 4, "state": "DISMISSED", "user": {"login": "user4"}},
-    ]
-
-    mock_session.get.return_value = reviewers_response
-
-    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
-
-    assert len(result) == 4
-    states = [r["state"] for r in result]
-    assert "APPROVED" in states
-    assert "CHANGES_REQUESTED" in states
-    assert "COMMENTED" in states
-
-
-def test_empty_reviewers_list(mock_session):
-    """Test handling PR with no reviewers."""
-    reviewers_response = Mock()
-    reviewers_response.status_code = 200
-    reviewers_response.json.return_value = []
-
-    mock_session.get.return_value = reviewers_response
-
-    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
-
-    assert result == []
-
-
-@patch("main.sleep_for_rate_limit")
-def test_rate_limit_handling(mock_sleep, mock_session):
-    """Test rate limit handling when fetching reviewers."""
-    rate_limit_response = Mock()
-    rate_limit_response.status_code = 403
-    rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
-
-    success_response = Mock()
-    success_response.status_code = 200
-    success_response.json.return_value = []
-
-    mock_session.get.side_effect = [rate_limit_response, success_response]
-
-    result = main.extract_reviewers(mock_session, "mozilla/firefox", 123)
-
-    mock_sleep.assert_called_once()
-    assert result == []
-
-
-def test_api_error(mock_session):
-    """Test API error handling when fetching reviewers."""
-    error_response = Mock()
-    error_response.status_code = 500
-    error_response.text = "Internal Server Error"
-
-    mock_session.get.return_value = error_response
-
-    with pytest.raises(SystemExit) as exc_info:
-        main.extract_reviewers(mock_session, "mozilla/firefox", 123)
-
-    assert "GitHub API error 500" in str(exc_info.value)
-
-
-def test_custom_github_api_url_reviewers(mock_session):
-    """Test using custom GitHub API URL for reviewers."""
-    custom_url = "https://mock-github.example.com"
-
-    reviewers_response = Mock()
-    reviewers_response.status_code = 200
-    reviewers_response.json.return_value = []
-
-    mock_session.get.return_value = reviewers_response
-
-    main.extract_reviewers(
-        mock_session, "mozilla/firefox", 123, github_api_url=custom_url
-    )
-
-    call_args = mock_session.get.call_args
-    assert custom_url in call_args[0][0]
-
-
-# =============================================================================
-# TESTS FOR EXTRACT_COMMENTS
-# =============================================================================
-
-
-def test_extract_comments_basic(mock_session):
-    """Test basic extraction of comments."""
-    comments_response = Mock()
-    comments_response.status_code = 200
-    comments_response.json.return_value = [
-        {
-            "id": 456,
-            "user": {"login": "commenter1"},
-            "body": "This looks good",
-            "created_at": "2024-01-01T14:00:00Z",
-        },
-        {
-            "id": 457,
-            "user": {"login": "commenter2"},
-            "body": "I have concerns",
-            "created_at": "2024-01-01T15:00:00Z",
-        },
-    ]
-
-    mock_session.get.return_value = comments_response
-
-    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
-
-    assert len(result) == 2
-    assert result[0]["id"] == 456
-    assert result[1]["id"] == 457
-
-
-def test_uses_issues_endpoint(mock_session):
-    """Test that comments use /issues endpoint not /pulls."""
-    comments_response = Mock()
-    comments_response.status_code = 200
-    comments_response.json.return_value = []
-
-    mock_session.get.return_value = comments_response
-
-    main.extract_comments(mock_session, "mozilla/firefox", 123)
-
-    call_args = mock_session.get.call_args
-    url = call_args[0][0]
-    assert "/issues/123/comments" in url
-    assert "/pulls/123/comments" not in url
-
-
-def test_multiple_comments(mock_session):
-    """Test handling multiple comments."""
-    comments_response = Mock()
-    comments_response.status_code = 200
-    comments_response.json.return_value = [
-        {"id": i, "user": {"login": f"user{i}"}, "body": f"Comment {i}"}
-        for i in range(1, 11)
-    ]
-
-    mock_session.get.return_value = comments_response
-
-    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
-
-    assert len(result) == 10
-
-
-def test_empty_comments_list(mock_session):
-    """Test handling PR with no comments."""
-    comments_response = Mock()
-    comments_response.status_code = 200
-    comments_response.json.return_value = []
-
-    mock_session.get.return_value = comments_response
-
-    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
-
-    assert result == []
-
-
-@patch("main.sleep_for_rate_limit")
-def test_rate_limit_handling_comments(mock_sleep, mock_session):
-    """Test rate limit handling when fetching comments."""
-    rate_limit_response = Mock()
-    rate_limit_response.status_code = 403
-    rate_limit_response.headers = {"X-RateLimit-Remaining": "0"}
-
-    success_response = Mock()
-    success_response.status_code = 200
-    success_response.json.return_value = []
-
-    mock_session.get.side_effect = [rate_limit_response, success_response]
-
-    result = main.extract_comments(mock_session, "mozilla/firefox", 123)
-
-    mock_sleep.assert_called_once()
-    assert result == []
-
-
-def test_api_error_comments(mock_session):
-    """Test API error handling when fetching comments."""
-    error_response = Mock()
-    error_response.status_code = 404
-    error_response.text = "Not Found"
-
-    mock_session.get.return_value = error_response
-
-    with pytest.raises(SystemExit) as exc_info:
-        main.extract_comments(mock_session, "mozilla/firefox", 123)
-
-    assert "GitHub API error 404" in str(exc_info.value)
-
-
-def test_custom_github_api_url_comments(mock_session):
-    """Test using custom GitHub API URL for comments."""
-    custom_url = "https://mock-github.example.com"
-
-    comments_response = Mock()
-    comments_response.status_code = 200
-    comments_response.json.return_value = []
-
-    mock_session.get.return_value = comments_response
-
-    main.extract_comments(
-        mock_session, "mozilla/firefox", 123, github_api_url=custom_url
-    )
-
-    call_args = mock_session.get.call_args
-    assert custom_url in call_args[0][0]
-
-
-# =============================================================================
-# TESTS FOR TRANSFORM_DATA
-# =============================================================================
-
-
-def test_transform_data_basic():
-    """Test basic transformation of pull request data."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "Fix login bug",
-            "state": "closed",
-            "created_at": "2024-01-01T10:00:00Z",
-            "updated_at": "2024-01-02T10:00:00Z",
-            "merged_at": "2024-01-02T12:00:00Z",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    assert len(result["pull_requests"]) == 1
-    pr = result["pull_requests"][0]
-    assert pr["pull_request_id"] == 123
-    assert pr["current_status"] == "closed"
-    assert pr["date_created"] == "2024-01-01T10:00:00Z"
-    assert pr["date_modified"] == "2024-01-02T10:00:00Z"
-    assert pr["date_landed"] == "2024-01-02T12:00:00Z"
-    assert pr["target_repository"] == "mozilla/firefox"
-
-
-def test_bug_id_extraction_basic():
-    """Test bug ID extraction from PR title."""
-    test_cases = [
-        ("Bug 1234567 - Fix issue", 1234567),
-        ("bug 1234567: Update code", 1234567),
-        ("Fix for bug 7654321", 7654321),
-        ("b=9876543 - Change behavior", 9876543),
-    ]
-
-    for title, expected_bug_id in test_cases:
-        raw_data = [
-            {
-                "number": 1,
-                "title": title,
-                "state": "open",
-                "labels": [],
-                "commit_data": [],
-                "reviewer_data": [],
-                "comment_data": [],
-            }
-        ]
-
-        result = main.transform_data(raw_data, "mozilla/firefox")
-        assert result["pull_requests"][0]["bug_id"] == expected_bug_id
-
-
-def test_bug_id_extraction_with_hash():
-    """Test bug ID extraction with # symbol."""
-    raw_data = [
-        {
-            "number": 1,
-            "title": "Bug #1234567 - Fix issue",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-    assert result["pull_requests"][0]["bug_id"] == 1234567
-
-
-def test_bug_id_filter_large_numbers():
-    """Test that bug IDs >= 100000000 are filtered out."""
-    raw_data = [
-        {
-            "number": 1,
-            "title": "Bug 999999999 - Invalid bug ID",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-    assert result["pull_requests"][0]["bug_id"] is None
-
-
-def test_bug_id_no_match():
-    """Test PR title with no bug ID."""
-    raw_data = [
-        {
-            "number": 1,
-            "title": "Update documentation",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-    assert result["pull_requests"][0]["bug_id"] is None
-
-
-def test_labels_extraction():
-    """Test labels array extraction."""
-    raw_data = [
-        {
-            "number": 1,
-            "title": "PR with labels",
-            "state": "open",
-            "labels": [
-                {"name": "bug"},
-                {"name": "priority-high"},
-                {"name": "needs-review"},
-            ],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-    labels = result["pull_requests"][0]["labels"]
-    assert len(labels) == 3
-    assert "bug" in labels
-    assert "priority-high" in labels
-    assert "needs-review" in labels
-
-
-def test_labels_empty_list():
-    """Test handling empty labels list."""
-    raw_data = [
-        {
-            "number": 1,
-            "title": "PR without labels",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-    assert result["pull_requests"][0]["labels"] == []
-
-
-def test_commit_transformation():
-    """Test commit fields mapping."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "PR with commits",
-            "state": "open",
-            "labels": [],
-            "commit_data": [
-                {
-                    "sha": "abc123",
-                    "commit": {
-                        "author": {
-                            "name": "Test Author",
-                            "date": "2024-01-01T12:00:00Z",
-                        }
-                    },
-                    "files": [
-                        {
-                            "filename": "src/main.py",
-                            "additions": 10,
-                            "deletions": 5,
-                        }
-                    ],
-                }
-            ],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    assert len(result["commits"]) == 1
-    commit = result["commits"][0]
-    assert commit["pull_request_id"] == 123
-    assert commit["target_repository"] == "mozilla/firefox"
-    assert commit["commit_sha"] == "abc123"
-    assert commit["date_created"] == "2024-01-01T12:00:00Z"
-    assert commit["author_username"] == "Test Author"
-    assert commit["filename"] == "src/main.py"
-    assert commit["lines_added"] == 10
-    assert commit["lines_removed"] == 5
-
-
-def test_commit_file_flattening():
-    """Test that each file becomes a separate row."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "PR with multiple files",
-            "state": "open",
-            "labels": [],
-            "commit_data": [
-                {
-                    "sha": "abc123",
-                    "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
-                    "files": [
-                        {"filename": "file1.py", "additions": 10, "deletions": 5},
-                        {"filename": "file2.py", "additions": 20, "deletions": 2},
-                        {"filename": "file3.py", "additions": 5, "deletions": 15},
-                    ],
-                }
-            ],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    # Should have 3 rows in commits table (one per file)
-    assert len(result["commits"]) == 3
-    filenames = [c["filename"] for c in result["commits"]]
-    assert "file1.py" in filenames
-    assert "file2.py" in filenames
-    assert "file3.py" in filenames
-
-
-def test_multiple_commits_with_files():
-    """Test multiple commits with multiple files per PR."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "PR with multiple commits",
-            "state": "open",
-            "labels": [],
-            "commit_data": [
-                {
-                    "sha": "commit1",
-                    "commit": {"author": {"name": "Author1", "date": "2024-01-01"}},
-                    "files": [
-                        {"filename": "file1.py", "additions": 10, "deletions": 0}
-                    ],
-                },
-                {
-                    "sha": "commit2",
-                    "commit": {"author": {"name": "Author2", "date": "2024-01-02"}},
-                    "files": [
-                        {"filename": "file2.py", "additions": 5, "deletions": 2},
-                        {"filename": "file3.py", "additions": 8, "deletions": 3},
-                    ],
-                },
-            ],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    # Should have 3 rows total (1 file from commit1, 2 files from commit2)
-    assert len(result["commits"]) == 3
-    assert result["commits"][0]["commit_sha"] == "commit1"
-    assert result["commits"][1]["commit_sha"] == "commit2"
-    assert result["commits"][2]["commit_sha"] == "commit2"
-
-
-def test_reviewer_transformation():
-    """Test reviewer fields mapping."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "PR with reviewers",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [
-                {
-                    "id": 789,
-                    "user": {"login": "reviewer1"},
-                    "state": "APPROVED",
-                    "submitted_at": "2024-01-01T15:00:00Z",
-                }
-            ],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    assert len(result["reviewers"]) == 1
-    reviewer = result["reviewers"][0]
-    assert reviewer["pull_request_id"] == 123
-    assert reviewer["target_repository"] == "mozilla/firefox"
-    assert reviewer["reviewer_username"] == "reviewer1"
-    assert reviewer["status"] == "APPROVED"
-    assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z"
-
-
-def test_transform_multiple_review_states():
-    """Test transforming data with multiple review states."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "PR with multiple reviews",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [
-                {
-                    "id": 1,
-                    "user": {"login": "user1"},
-                    "state": "APPROVED",
-                    "submitted_at": "2024-01-01T15:00:00Z",
-                },
-                {
-                    "id": 2,
-                    "user": {"login": "user2"},
-                    "state": "CHANGES_REQUESTED",
-                    "submitted_at": "2024-01-01T16:00:00Z",
-                },
-                {
-                    "id": 3,
-                    "user": {"login": "user3"},
-                    "state": "COMMENTED",
-                    "submitted_at": "2024-01-01T17:00:00Z",
-                },
-            ],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    assert len(result["reviewers"]) == 3
-    states = [r["status"] for r in result["reviewers"]]
-    assert "APPROVED" in states
-    assert "CHANGES_REQUESTED" in states
-    assert "COMMENTED" in states
-
-
-def test_date_approved_from_earliest_approval():
-    """Test that date_approved is set to earliest APPROVED review."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "PR with multiple approvals",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [
-                {
-                    "id": 1,
-                    "user": {"login": "user1"},
-                    "state": "APPROVED",
-                    "submitted_at": "2024-01-02T15:00:00Z",
-                },
-                {
-                    "id": 2,
-                    "user": {"login": "user2"},
-                    "state": "APPROVED",
-                    "submitted_at": "2024-01-01T14:00:00Z",  # Earliest
-                },
-                {
-                    "id": 3,
-                    "user": {"login": "user3"},
-                    "state": "APPROVED",
-                    "submitted_at": "2024-01-03T16:00:00Z",
-                },
-            ],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    pr = result["pull_requests"][0]
-    assert pr["date_approved"] == "2024-01-01T14:00:00Z"
-
-
-def test_comment_transformation():
-    """Test comment fields mapping."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "PR with comments",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [
-                {
-                    "id": 456,
-                    "user": {"login": "commenter1"},
-                    "body": "This looks great!",
-                    "created_at": "2024-01-01T14:00:00Z",
-                    "pull_request_review_id": None,
-                }
-            ],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    assert len(result["comments"]) == 1
-    comment = result["comments"][0]
-    assert comment["pull_request_id"] == 123
-    assert comment["target_repository"] == "mozilla/firefox"
-    assert comment["comment_id"] == 456
-    assert comment["author_username"] == "commenter1"
-    assert comment["date_created"] == "2024-01-01T14:00:00Z"
-    assert comment["character_count"] == 17
-
-
-def test_comment_character_count():
-    """Test character count calculation for comments."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "PR",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [
-                {
-                    "id": 1,
-                    "user": {"login": "user1"},
-                    "body": "Short",
-                    "created_at": "2024-01-01",
-                },
-                {
-                    "id": 2,
-                    "user": {"login": "user2"},
-                    "body": "This is a much longer comment with more text",
-                    "created_at": "2024-01-01",
-                },
-            ],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    assert result["comments"][0]["character_count"] == 5
-    assert result["comments"][1]["character_count"] == 44
-
-
-def test_comment_status_from_review():
-    """Test that comment status is mapped from review_id_statuses."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "PR",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [
-                {
-                    "id": 789,
-                    "user": {"login": "reviewer"},
-                    "state": "APPROVED",
-                    "submitted_at": "2024-01-01",
-                }
-            ],
-            "comment_data": [
-                {
-                    "id": 456,
-                    "user": {"login": "commenter"},
-                    "body": "LGTM",
-                    "created_at": "2024-01-01",
-                    "pull_request_review_id": 789,
-                }
-            ],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    # Comment should have status from the review
-    assert result["comments"][0]["status"] == "APPROVED"
-
-
-def test_comment_empty_body():
-    """Test handling comments with empty or None body."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "PR",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [
-                {
-                    "id": 1,
-                    "user": {"login": "user1"},
-                    "body": None,
-                    "created_at": "2024-01-01",
-                },
-                {
-                    "id": 2,
-                    "user": {"login": "user2"},
-                    "body": "",
-                    "created_at": "2024-01-01",
-                },
-            ],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    assert result["comments"][0]["character_count"] == 0
-    assert result["comments"][1]["character_count"] == 0
-
-
-def test_empty_raw_data():
-    """Test handling empty input list."""
-    result = main.transform_data([], "mozilla/firefox")
-
-    assert result["pull_requests"] == []
-    assert result["commits"] == []
-    assert result["reviewers"] == []
-    assert result["comments"] == []
-
-
-def test_pr_without_commits_reviewers_comments():
-    """Test PR with no commits, reviewers, or comments."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "Minimal PR",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    assert len(result["pull_requests"]) == 1
-    assert len(result["commits"]) == 0
-    assert len(result["reviewers"]) == 0
-    assert len(result["comments"]) == 0
-
-
-def test_return_structure():
-    """Test that transform_data returns dict with 4 keys."""
-    raw_data = [
-        {
-            "number": 1,
-            "title": "Test",
-            "state": "open",
-            "labels": [],
-            "commit_data": [],
-            "reviewer_data": [],
-            "comment_data": [],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    assert isinstance(result, dict)
-    assert "pull_requests" in result
-    assert "commits" in result
-    assert "reviewers" in result
-    assert "comments" in result
-
-
-def test_all_tables_have_target_repository():
-    """Test that all tables include target_repository field."""
-    raw_data = [
-        {
-            "number": 123,
-            "title": "Test PR",
-            "state": "open",
-            "labels": [],
-            "commit_data": [
-                {
-                    "sha": "abc",
-                    "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
-                    "files": [{"filename": "test.py", "additions": 1, "deletions": 0}],
-                }
-            ],
-            "reviewer_data": [
-                {
-                    "id": 1,
-                    "user": {"login": "reviewer"},
-                    "state": "APPROVED",
-                    "submitted_at": "2024-01-01",
-                }
-            ],
-            "comment_data": [
-                {
-                    "id": 2,
-                    "user": {"login": "commenter"},
-                    "body": "Test",
-                    "created_at": "2024-01-01",
-                }
-            ],
-        }
-    ]
-
-    result = main.transform_data(raw_data, "mozilla/firefox")
-
-    assert result["pull_requests"][0]["target_repository"] == "mozilla/firefox"
-    assert result["commits"][0]["target_repository"] == "mozilla/firefox"
-    assert result["reviewers"][0]["target_repository"] == "mozilla/firefox"
-    assert result["comments"][0]["target_repository"] == "mozilla/firefox"
-
-
-# =============================================================================
-# TESTS FOR LOAD_DATA
-# =============================================================================
-
-
-@patch("main.datetime")
-def test_load_data_inserts_all_tables(mock_datetime, mock_bigquery_client):
-    """Test that load_data inserts all tables correctly."""
-    mock_datetime.now.return_value.strftime.return_value = "2024-01-15"
-
-    transformed_data = {
-        "pull_requests": [{"pull_request_id": 1}],
-        "commits": [{"commit_sha": "abc"}],
-        "reviewers": [{"reviewer_username": "user1"}],
-        "comments": [{"comment_id": 123}],
-    }
-
-    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
-
-    # Should call insert_rows_json 4 times (once per table)
-    assert mock_bigquery_client.insert_rows_json.call_count == 4
-
-
-@patch("main.datetime")
-def test_adds_snapshot_date(mock_datetime, mock_bigquery_client):
-    """Test that snapshot_date is added to all rows."""
-    mock_datetime.now.return_value.strftime.return_value = "2024-01-15"
-
-    transformed_data = {
-        "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}],
-        "commits": [],
-        "reviewers": [],
-        "comments": [],
-    }
-
-    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
-
-    call_args = mock_bigquery_client.insert_rows_json.call_args
-    rows = call_args[0][1]
-    assert all(row["snapshot_date"] == "2024-01-15" for row in rows)
-
-
-def test_constructs_correct_table_ref(mock_bigquery_client):
-    """Test that table_ref is constructed correctly."""
-    transformed_data = {
-        "pull_requests": [{"pull_request_id": 1}],
-        "commits": [],
-        "reviewers": [],
-        "comments": [],
-    }
-
-    main.load_data(mock_bigquery_client, "my_dataset", transformed_data)
-
-    call_args = mock_bigquery_client.insert_rows_json.call_args
-    table_ref = call_args[0][0]
-    assert table_ref == "test-project.my_dataset.pull_requests"
-
-
-def test_empty_transformed_data_skipped(mock_bigquery_client):
-    """Test that empty transformed_data dict is skipped."""
-    transformed_data = {}
-
-    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
-
-    mock_bigquery_client.insert_rows_json.assert_not_called()
-
-
-def test_skips_empty_tables_individually(mock_bigquery_client):
-    """Test that empty tables are skipped individually."""
-    transformed_data = {
-        "pull_requests": [{"pull_request_id": 1}],
-        "commits": [],  # Empty, should be skipped
-        "reviewers": [],  # Empty, should be skipped
-        "comments": [{"comment_id": 456}],
-    }
-
-    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
-
-    # Should only call insert_rows_json twice (for PRs and comments)
-    assert mock_bigquery_client.insert_rows_json.call_count == 2
-
-
-def test_only_pull_requests_table(mock_bigquery_client):
-    """Test loading only pull_requests table."""
-    transformed_data = {
-        "pull_requests": [{"pull_request_id": 1}],
-        "commits": [],
-        "reviewers": [],
-        "comments": [],
-    }
-
-    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
-
-    assert mock_bigquery_client.insert_rows_json.call_count == 1
-
-
-def test_raises_exception_on_insert_errors(mock_bigquery_client):
-    """Test that Exception is raised on BigQuery insert errors."""
-    mock_bigquery_client.insert_rows_json.return_value = [
-        {"index": 0, "errors": ["Insert failed"]}
-    ]
-
-    transformed_data = {
-        "pull_requests": [{"pull_request_id": 1}],
-        "commits": [],
-        "reviewers": [],
-        "comments": [],
-    }
-
-    with pytest.raises(Exception) as exc_info:
-        main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
-
-    assert "BigQuery insert errors" in str(exc_info.value)
-
-
-def test_verifies_client_insert_called_correctly(mock_bigquery_client):
-    """Test that client.insert_rows_json is called with correct arguments."""
-    transformed_data = {
-        "pull_requests": [{"pull_request_id": 1}, {"pull_request_id": 2}],
-        "commits": [],
-        "reviewers": [],
-        "comments": [],
-    }
-
-    main.load_data(mock_bigquery_client, "test_dataset", transformed_data)
-
-    call_args = mock_bigquery_client.insert_rows_json.call_args
-    table_ref, rows = call_args[0]
-
-    assert "pull_requests" in table_ref
-    assert len(rows) == 2
-
-
-# =============================================================================
-# TESTS FOR MAIN
-# =============================================================================
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_requires_github_repos(mock_session_class, mock_bq_client, mock_setup_logging):
-    """Test that GITHUB_REPOS is required."""
-    with patch.dict(
-        os.environ,
-        {"BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test"},
-        clear=True,
-    ):
-        with pytest.raises(SystemExit) as exc_info:
-            main.main()
-
-        assert "GITHUB_REPOS" in str(exc_info.value)
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_requires_bigquery_project(
-    mock_session_class, mock_bq_client, mock_setup_logging
-):
-    """Test that BIGQUERY_PROJECT is required."""
-    with patch.dict(
-        os.environ,
-        {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"},
-        clear=True,
-    ):
-        with pytest.raises(SystemExit) as exc_info:
-            main.main()
-
-        assert "BIGQUERY_PROJECT" in str(exc_info.value)
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_requires_bigquery_dataset(
-    mock_session_class, mock_bq_client, mock_setup_logging
-):
-    """Test that BIGQUERY_DATASET is required."""
-    with patch.dict(
-        os.environ,
-        {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"},
-        clear=True,
-    ):
-        with pytest.raises(SystemExit) as exc_info:
-            main.main()
-
-        assert "BIGQUERY_DATASET" in str(exc_info.value)
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_github_token_optional_with_warning(
-    mock_session_class, mock_bq_client, mock_setup_logging
-):
-    """Test that GITHUB_TOKEN is optional but warns if missing."""
-    with (
-        patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-            },
-            clear=True,
-        ),
-        patch("main.extract_pull_requests", return_value=iter([])),
-    ):
-        # Should not raise, but should log warning
-        result = main.main()
-        assert result == 0
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_splits_github_repos_by_comma(
-    mock_session_class, mock_bq_client, mock_setup_logging
-):
-    """Test that GITHUB_REPOS is split by comma."""
-    with (
-        patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
-            },
-            clear=True,
-        ),
-        patch("main.extract_pull_requests", return_value=iter([])) as mock_extract,
-    ):
-        main.main()
-
-        # Should be called twice (once per repo)
-        assert mock_extract.call_count == 2
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_honors_github_api_url(mock_session_class, mock_bq_client, mock_setup_logging):
-    """Test that GITHUB_API_URL is honored."""
-    with (
-        patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
-                "GITHUB_API_URL": "https://custom-api.example.com",
-            },
-            clear=True,
-        ),
-        patch("main.extract_pull_requests", return_value=iter([])) as mock_extract,
-    ):
-        main.main()
-
-        call_kwargs = mock_extract.call_args[1]
-        assert call_kwargs["github_api_url"] == "https://custom-api.example.com"
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_honors_bigquery_emulator_host(
-    mock_session_class, mock_bq_client_class, mock_setup_logging
-):
-    """Test that BIGQUERY_EMULATOR_HOST is honored."""
-    with (
-        patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
-                "BIGQUERY_EMULATOR_HOST": "http://localhost:9050",
-            },
-            clear=True,
-        ),
-        patch("main.extract_pull_requests", return_value=iter([])),
-    ):
-        main.main()
-
-        # Verify BigQuery client was created with emulator settings
-        mock_bq_client_class.assert_called_once()
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_creates_session_with_headers(
-    mock_session_class, mock_bq_client, mock_setup_logging
-):
-    """Test that session is created with Accept and User-Agent headers."""
-    mock_session = MagicMock()
-    mock_session_class.return_value = mock_session
-
-    with (
-        patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
-            },
-            clear=True,
-        ),
-        patch("main.extract_pull_requests", return_value=iter([])),
-    ):
-        main.main()
-
-        # Verify session headers were set
-        assert mock_session.headers.update.called
-        call_args = mock_session.headers.update.call_args[0][0]
-        assert "Accept" in call_args
-        assert "User-Agent" in call_args
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_sets_authorization_header_with_token(
-    mock_session_class, mock_bq_client, mock_setup_logging
-):
-    """Test that Authorization header is set when token provided."""
-    mock_session = MagicMock()
-    mock_session_class.return_value = mock_session
-
-    with (
-        patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "test-token-123",
-            },
-            clear=True,
-        ),
-        patch("main.extract_pull_requests", return_value=iter([])),
-    ):
-        main.main()
-
-        # Verify Authorization header was set
-        assert mock_session.headers.__setitem__.called
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-@patch("main.extract_pull_requests")
-@patch("main.transform_data")
-@patch("main.load_data")
-def test_single_repo_successful_etl(
-    mock_load,
-    mock_transform,
-    mock_extract,
-    mock_session_class,
-    mock_bq_client,
-    mock_setup_logging,
-):
-    """Test successful ETL for single repository."""
-    mock_extract.return_value = iter([[{"number": 1}]])
-    mock_transform.return_value = {
-        "pull_requests": [{"pull_request_id": 1}],
-        "commits": [],
-        "reviewers": [],
-        "comments": [],
-    }
-
-    with patch.dict(
-        os.environ,
-        {
-            "GITHUB_REPOS": "mozilla/firefox",
-            "BIGQUERY_PROJECT": "test",
-            "BIGQUERY_DATASET": "test",
-            "GITHUB_TOKEN": "token",
-        },
-        clear=True,
-    ):
-        result = main.main()
-
-    assert result == 0
-    mock_extract.assert_called_once()
-    mock_transform.assert_called_once()
-    mock_load.assert_called_once()
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-@patch("main.extract_pull_requests")
-@patch("main.transform_data")
-@patch("main.load_data")
-def test_multiple_repos_processing(
-    mock_load,
-    mock_transform,
-    mock_extract,
-    mock_session_class,
-    mock_bq_client,
-    mock_setup_logging,
-):
-    """Test processing multiple repositories."""
-    mock_extract.return_value = iter([[{"number": 1}]])
-    mock_transform.return_value = {
-        "pull_requests": [{"pull_request_id": 1}],
-        "commits": [],
-        "reviewers": [],
-        "comments": [],
-    }
-
-    with patch.dict(
-        os.environ,
-        {
-            "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev,mozilla/addons",
-            "BIGQUERY_PROJECT": "test",
-            "BIGQUERY_DATASET": "test",
-            "GITHUB_TOKEN": "token",
-        },
-        clear=True,
-    ):
-        result = main.main()
-
-    assert result == 0
-    # Should process 3 repositories
-    assert mock_extract.call_count == 3
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-@patch("main.extract_pull_requests")
-@patch("main.transform_data")
-@patch("main.load_data")
-def test_processes_chunks_iteratively(
-    mock_load,
-    mock_transform,
-    mock_extract,
-    mock_session_class,
-    mock_bq_client,
-    mock_setup_logging,
-):
-    """Test that chunks are processed iteratively from generator."""
-    # Return 3 chunks
-    mock_extract.return_value = iter(
-        [
-            [{"number": 1}],
-            [{"number": 2}],
-            [{"number": 3}],
-        ]
-    )
-    mock_transform.return_value = {
-        "pull_requests": [{"pull_request_id": 1}],
-        "commits": [],
-        "reviewers": [],
-        "comments": [],
-    }
-
-    with patch.dict(
-        os.environ,
-        {
-            "GITHUB_REPOS": "mozilla/firefox",
-            "BIGQUERY_PROJECT": "test",
-            "BIGQUERY_DATASET": "test",
-            "GITHUB_TOKEN": "token",
-        },
-        clear=True,
-    ):
-        result = main.main()
-
-    assert result == 0
-    # Transform and load should be called 3 times (once per chunk)
-    assert mock_transform.call_count == 3
-    assert mock_load.call_count == 3
-
-
-@patch("main.setup_logging")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_returns_zero_on_success(
-    mock_session_class, mock_bq_client, mock_setup_logging
-):
-    """Test that main returns 0 on success."""
-    with (
-        patch.dict(
-            os.environ,
-            {
-                "GITHUB_REPOS": "mozilla/firefox",
-                "BIGQUERY_PROJECT": "test",
-                "BIGQUERY_DATASET": "test",
-                "GITHUB_TOKEN": "token",
-            },
-            clear=True,
-        ),
-        patch("main.extract_pull_requests", return_value=iter([])),
-    ):
-        result = main.main()
-
-    assert result == 0
-
-
-@pytest.mark.integration
-@patch("main.setup_logging")
-@patch("main.load_data")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_full_etl_flow_transforms_data_correctly(
-    mock_session_class, mock_bq_client, mock_load, mock_setup_logging
-):
-    """Test full ETL flow with mocked GitHub responses."""
-    mock_session = MagicMock()
-    mock_session_class.return_value = mock_session
-
-    # Mock PR response
-    pr_response = Mock()
-    pr_response.status_code = 200
-    pr_response.json.return_value = [
-        {"number": 1, "title": "Bug 1234567 - Test PR", "state": "open"}
-    ]
-    pr_response.links = {}
-
-    # Mock commits, reviewers, comments responses
-    empty_response = Mock()
-    empty_response.status_code = 200
-    empty_response.json.return_value = []
-
-    mock_session.get.side_effect = [
-        pr_response,
-        empty_response,
-        empty_response,
-        empty_response,
-    ]
-
-    with patch.dict(
-        os.environ,
-        {
-            "GITHUB_REPOS": "mozilla/firefox",
-            "BIGQUERY_PROJECT": "test",
-            "BIGQUERY_DATASET": "test",
-            "GITHUB_TOKEN": "token",
-        },
-        clear=True,
-    ):
-        result = main.main()
-
-    assert result == 0
-    mock_load.assert_called_once()
-
-    # Verify transformed data structure
-    call_args = mock_load.call_args[0]
-    transformed_data = call_args[2]
-    assert "pull_requests" in transformed_data
-    assert len(transformed_data["pull_requests"]) == 1
-
-
-@patch("main.setup_logging")
-@patch("main.load_data")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_bug_id_extraction_through_pipeline(
-    mock_session_class, mock_bq_client, mock_load, mock_setup_logging
-):
-    """Test bug ID extraction through full pipeline."""
-    mock_session = MagicMock()
-    mock_session_class.return_value = mock_session
-
-    pr_response = Mock()
-    pr_response.status_code = 200
-    pr_response.json.return_value = [
-        {
-            "number": 1,
-            "title": "Bug 9876543 - Fix critical issue",
-            "state": "closed",
-        }
-    ]
-    pr_response.links = {}
-
-    empty_response = Mock()
-    empty_response.status_code = 200
-    empty_response.json.return_value = []
-
-    mock_session.get.side_effect = [
-        pr_response,
-        empty_response,
-        empty_response,
-        empty_response,
-    ]
-
-    with patch.dict(
-        os.environ,
-        {
-            "GITHUB_REPOS": "mozilla/firefox",
-            "BIGQUERY_PROJECT": "test",
-            "BIGQUERY_DATASET": "test",
-            "GITHUB_TOKEN": "token",
-        },
-        clear=True,
-    ):
-        main.main()
-
-    call_args = mock_load.call_args[0]
-    transformed_data = call_args[2]
-    pr = transformed_data["pull_requests"][0]
-    assert pr["bug_id"] == 9876543
-
-
-@patch("main.setup_logging")
-@patch("main.load_data")
-@patch("main.bigquery.Client")
-@patch("requests.Session")
-def test_pagination_through_full_flow(
-    mock_session_class, mock_bq_client, mock_load, mock_setup_logging
-):
-    """Test pagination through full ETL flow."""
-    mock_session = MagicMock()
-    mock_session_class.return_value = mock_session
-
-    # First page
-    pr_response_1 = Mock()
-    pr_response_1.status_code = 200
-    pr_response_1.json.return_value = [{"number": 1, "title": "PR 1", "state": "open"}]
-    pr_response_1.links = {
-        "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
-    }
-
-    # Second page
-    pr_response_2 = Mock()
-    pr_response_2.status_code = 200
-    pr_response_2.json.return_value = [{"number": 2, "title": "PR 2", "state": "open"}]
-    pr_response_2.links = {}
-
-    empty_response = Mock()
-    empty_response.status_code = 200
-    empty_response.json.return_value = []
-
-    mock_session.get.side_effect = [
-        pr_response_1,
-        empty_response,
-        empty_response,
-        empty_response,
-        pr_response_2,
-        empty_response,
-        empty_response,
-        empty_response,
-    ]
-
-    with patch.dict(
-        os.environ,
-        {
-            "GITHUB_REPOS": "mozilla/firefox",
-            "BIGQUERY_PROJECT": "test",
-            "BIGQUERY_DATASET": "test",
-            "GITHUB_TOKEN": "token",
-        },
-        clear=True,
-    ):
-        main.main()
-
-    # Should be called twice (once per chunk/page)
-    assert mock_load.call_count == 2
diff --git a/tests/test_main_integration.py b/tests/test_main_integration.py
new file mode 100644
index 0000000..e09d940
--- /dev/null
+++ b/tests/test_main_integration.py
@@ -0,0 +1,544 @@
+#!/usr/bin/env python3
+"""
+Tests for main function and full ETL integration.
+
+Tests main orchestration including environment variables, session setup,
+repository processing, chunked ETL flow, and end-to-end integration tests.
+"""
+
+import os
+from unittest.mock import MagicMock, Mock, patch
+
+import pytest
+
+import main
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_requires_github_repos(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that GITHUB_REPOS is required."""
+    with patch.dict(
+        os.environ,
+        {"BIGQUERY_PROJECT": "test", "BIGQUERY_DATASET": "test"},
+        clear=True,
+    ):
+        with pytest.raises(SystemExit) as exc_info:
+            main.main()
+
+        assert "GITHUB_REPOS" in str(exc_info.value)
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_requires_bigquery_project(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
+    """Test that BIGQUERY_PROJECT is required."""
+    with patch.dict(
+        os.environ,
+        {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_DATASET": "test"},
+        clear=True,
+    ):
+        with pytest.raises(SystemExit) as exc_info:
+            main.main()
+
+        assert "BIGQUERY_PROJECT" in str(exc_info.value)
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_requires_bigquery_dataset(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
+    """Test that BIGQUERY_DATASET is required."""
+    with patch.dict(
+        os.environ,
+        {"GITHUB_REPOS": "mozilla/firefox", "BIGQUERY_PROJECT": "test"},
+        clear=True,
+    ):
+        with pytest.raises(SystemExit) as exc_info:
+            main.main()
+
+        assert "BIGQUERY_DATASET" in str(exc_info.value)
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_github_token_optional_with_warning(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
+    """Test that GITHUB_TOKEN is optional but warns if missing."""
+    with (
+        patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+            },
+            clear=True,
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])),
+    ):
+        # Should not raise, but should log warning
+        result = main.main()
+        assert result == 0
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_splits_github_repos_by_comma(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
+    """Test that GITHUB_REPOS is split by comma."""
+    with (
+        patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])) as mock_extract,
+    ):
+        main.main()
+
+        # Should be called twice (once per repo)
+        assert mock_extract.call_count == 2
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_honors_github_api_url(mock_session_class, mock_bq_client, mock_setup_logging):
+    """Test that GITHUB_API_URL is honored."""
+    with (
+        patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+                "GITHUB_API_URL": "https://custom-api.example.com",
+            },
+            clear=True,
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])) as mock_extract,
+    ):
+        main.main()
+
+        call_kwargs = mock_extract.call_args[1]
+        assert call_kwargs["github_api_url"] == "https://custom-api.example.com"
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_honors_bigquery_emulator_host(
+    mock_session_class, mock_bq_client_class, mock_setup_logging
+):
+    """Test that BIGQUERY_EMULATOR_HOST is honored."""
+    with (
+        patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+                "BIGQUERY_EMULATOR_HOST": "http://localhost:9050",
+            },
+            clear=True,
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])),
+    ):
+        main.main()
+
+        # Verify BigQuery client was created with emulator settings
+        mock_bq_client_class.assert_called_once()
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_creates_session_with_headers(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
+    """Test that session is created with Accept and User-Agent headers."""
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
+
+    with (
+        patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])),
+    ):
+        main.main()
+
+        # Verify session headers were set
+        assert mock_session.headers.update.called
+        call_args = mock_session.headers.update.call_args[0][0]
+        assert "Accept" in call_args
+        assert "User-Agent" in call_args
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_sets_authorization_header_with_token(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
+    """Test that Authorization header is set when token provided."""
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
+
+    with (
+        patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "test-token-123",
+            },
+            clear=True,
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])),
+    ):
+        main.main()
+
+        # Verify Authorization header was set
+        assert mock_session.headers.__setitem__.called
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+@patch("main.extract_pull_requests")
+@patch("main.transform_data")
+@patch("main.load_data")
+def test_single_repo_successful_etl(
+    mock_load,
+    mock_transform,
+    mock_extract,
+    mock_session_class,
+    mock_bq_client,
+    mock_setup_logging,
+):
+    """Test successful ETL for single repository."""
+    mock_extract.return_value = iter([[{"number": 1}]])
+    mock_transform.return_value = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
+
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        result = main.main()
+
+    assert result == 0
+    mock_extract.assert_called_once()
+    mock_transform.assert_called_once()
+    mock_load.assert_called_once()
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+@patch("main.extract_pull_requests")
+@patch("main.transform_data")
+@patch("main.load_data")
+def test_multiple_repos_processing(
+    mock_load,
+    mock_transform,
+    mock_extract,
+    mock_session_class,
+    mock_bq_client,
+    mock_setup_logging,
+):
+    """Test processing multiple repositories."""
+    mock_extract.return_value = iter([[{"number": 1}]])
+    mock_transform.return_value = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
+
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox,mozilla/gecko-dev,mozilla/addons",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        result = main.main()
+
+    assert result == 0
+    # Should process 3 repositories
+    assert mock_extract.call_count == 3
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+@patch("main.extract_pull_requests")
+@patch("main.transform_data")
+@patch("main.load_data")
+def test_processes_chunks_iteratively(
+    mock_load,
+    mock_transform,
+    mock_extract,
+    mock_session_class,
+    mock_bq_client,
+    mock_setup_logging,
+):
+    """Test that chunks are processed iteratively from generator."""
+    # Return 3 chunks
+    mock_extract.return_value = iter(
+        [
+            [{"number": 1}],
+            [{"number": 2}],
+            [{"number": 3}],
+        ]
+    )
+    mock_transform.return_value = {
+        "pull_requests": [{"pull_request_id": 1}],
+        "commits": [],
+        "reviewers": [],
+        "comments": [],
+    }
+
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        result = main.main()
+
+    assert result == 0
+    # Transform and load should be called 3 times (once per chunk)
+    assert mock_transform.call_count == 3
+    assert mock_load.call_count == 3
+
+
+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_returns_zero_on_success(
+    mock_session_class, mock_bq_client, mock_setup_logging
+):
+    """Test that main returns 0 on success."""
+    with (
+        patch.dict(
+            os.environ,
+            {
+                "GITHUB_REPOS": "mozilla/firefox",
+                "BIGQUERY_PROJECT": "test",
+                "BIGQUERY_DATASET": "test",
+                "GITHUB_TOKEN": "token",
+            },
+            clear=True,
+        ),
+        patch("main.extract_pull_requests", return_value=iter([])),
+    ):
+        result = main.main()
+
+    assert result == 0
+
+
+@pytest.mark.integration
+@patch("main.setup_logging")
+@patch("main.load_data")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_full_etl_flow_transforms_data_correctly(
+    mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+):
+    """Test full ETL flow with mocked GitHub responses."""
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
+
+    # Mock PR response
+    pr_response = Mock()
+    pr_response.status_code = 200
+    pr_response.json.return_value = [
+        {"number": 1, "title": "Bug 1234567 - Test PR", "state": "open"}
+    ]
+    pr_response.links = {}
+
+    # Mock commits, reviewers, comments responses
+    empty_response = Mock()
+    empty_response.status_code = 200
+    empty_response.json.return_value = []
+
+    mock_session.get.side_effect = [
+        pr_response,
+        empty_response,
+        empty_response,
+        empty_response,
+    ]
+
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        result = main.main()
+
+    assert result == 0
+    mock_load.assert_called_once()
+
+    # Verify transformed data structure
+    call_args = mock_load.call_args[0]
+    transformed_data = call_args[2]
+    assert "pull_requests" in transformed_data
+    assert len(transformed_data["pull_requests"]) == 1
+
+
+@patch("main.setup_logging")
+@patch("main.load_data")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_bug_id_extraction_through_pipeline(
+    mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+):
+    """Test bug ID extraction through full pipeline."""
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
+
+    pr_response = Mock()
+    pr_response.status_code = 200
+    pr_response.json.return_value = [
+        {
+            "number": 1,
+            "title": "Bug 9876543 - Fix critical issue",
+            "state": "closed",
+        }
+    ]
+    pr_response.links = {}
+
+    empty_response = Mock()
+    empty_response.status_code = 200
+    empty_response.json.return_value = []
+
+    mock_session.get.side_effect = [
+        pr_response,
+        empty_response,
+        empty_response,
+        empty_response,
+    ]
+
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        main.main()
+
+    call_args = mock_load.call_args[0]
+    transformed_data = call_args[2]
+    pr = transformed_data["pull_requests"][0]
+    assert pr["bug_id"] == 9876543
+
+
+@patch("main.setup_logging")
+@patch("main.load_data")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_pagination_through_full_flow(
+    mock_session_class, mock_bq_client, mock_load, mock_setup_logging
+):
+    """Test pagination through full ETL flow."""
+    mock_session = MagicMock()
+    mock_session_class.return_value = mock_session
+
+    # First page
+    pr_response_1 = Mock()
+    pr_response_1.status_code = 200
+    pr_response_1.json.return_value = [{"number": 1, "title": "PR 1", "state": "open"}]
+    pr_response_1.links = {
+        "next": {"url": "https://api.github.com/repos/mozilla/firefox/pulls?page=2"}
+    }
+
+    # Second page
+    pr_response_2 = Mock()
+    pr_response_2.status_code = 200
+    pr_response_2.json.return_value = [{"number": 2, "title": "PR 2", "state": "open"}]
+    pr_response_2.links = {}
+
+    empty_response = Mock()
+    empty_response.status_code = 200
+    empty_response.json.return_value = []
+
+    mock_session.get.side_effect = [
+        pr_response_1,
+        empty_response,
+        empty_response,
+        empty_response,
+        pr_response_2,
+        empty_response,
+        empty_response,
+        empty_response,
+    ]
+
+    with patch.dict(
+        os.environ,
+        {
+            "GITHUB_REPOS": "mozilla/firefox",
+            "BIGQUERY_PROJECT": "test",
+            "BIGQUERY_DATASET": "test",
+            "GITHUB_TOKEN": "token",
+        },
+        clear=True,
+    ):
+        main.main()
+
+    # Should be called twice (once per chunk/page)
+    assert mock_load.call_count == 2
diff --git a/tests/test_rate_limit.py b/tests/test_rate_limit.py
new file mode 100644
index 0000000..9d32961
--- /dev/null
+++ b/tests/test_rate_limit.py
@@ -0,0 +1,72 @@
+#!/usr/bin/env python3
+"""
+Tests for sleep_for_rate_limit function.
+
+Tests rate limit handling including wait time calculation and edge cases.
+"""
+
+from unittest.mock import Mock, patch
+
+import main
+
+
+@patch("time.time")
+@patch("time.sleep")
+def test_sleep_for_rate_limit_calculates_wait_time(mock_sleep, mock_time):
+    """Test that sleep_for_rate_limit calculates correct wait time."""
+    mock_time.return_value = 1000
+
+    mock_response = Mock()
+    mock_response.headers = {
+        "X-RateLimit-Remaining": "0",
+        "X-RateLimit-Reset": "1120",  # 120 seconds from now
+    }
+
+    main.sleep_for_rate_limit(mock_response)
+
+    mock_sleep.assert_called_once_with(120)
+
+
+@patch("time.time")
+@patch("time.sleep")
+def test_sleep_for_rate_limit_when_reset_already_passed(mock_sleep, mock_time):
+    """Test that sleep_for_rate_limit doesn't sleep negative time."""
+    mock_time.return_value = 2000
+
+    mock_response = Mock()
+    mock_response.headers = {
+        "X-RateLimit-Remaining": "0",
+        "X-RateLimit-Reset": "1500",  # Already passed
+    }
+
+    main.sleep_for_rate_limit(mock_response)
+
+    # Should sleep for 0 seconds (max of 0 and negative value)
+    mock_sleep.assert_called_once_with(0)
+
+
+@patch("time.sleep")
+def test_sleep_for_rate_limit_when_remaining_not_zero(mock_sleep):
+    """Test that sleep_for_rate_limit doesn't sleep when remaining > 0."""
+    mock_response = Mock()
+    mock_response.headers = {
+        "X-RateLimit-Remaining": "5",
+        "X-RateLimit-Reset": "1500",
+    }
+
+    main.sleep_for_rate_limit(mock_response)
+
+    # Should not sleep when remaining > 0
+    mock_sleep.assert_not_called()
+
+
+@patch("time.sleep")
+def test_sleep_for_rate_limit_with_missing_headers(mock_sleep):
+    """Test sleep_for_rate_limit with missing rate limit headers."""
+    mock_response = Mock()
+    mock_response.headers = {}
+
+    main.sleep_for_rate_limit(mock_response)
+
+    # Should not sleep when headers are missing (defaults to remaining=1)
+    mock_sleep.assert_not_called()
diff --git a/tests/test_transform_data.py b/tests/test_transform_data.py
new file mode 100644
index 0000000..2b8353b
--- /dev/null
+++ b/tests/test_transform_data.py
@@ -0,0 +1,625 @@
+#!/usr/bin/env python3
+"""
+Tests for transform_data function.
+
+Tests data transformation including bug ID extraction, label processing,
+commit/reviewer/comment flattening, and field mapping.
+"""
+
+import main
+
+
+def test_transform_data_basic():
+    """Test basic transformation of pull request data."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "Fix login bug",
+            "state": "closed",
+            "created_at": "2024-01-01T10:00:00Z",
+            "updated_at": "2024-01-02T10:00:00Z",
+            "merged_at": "2024-01-02T12:00:00Z",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["pull_requests"]) == 1
+    pr = result["pull_requests"][0]
+    assert pr["pull_request_id"] == 123
+    assert pr["current_status"] == "closed"
+    assert pr["date_created"] == "2024-01-01T10:00:00Z"
+    assert pr["date_modified"] == "2024-01-02T10:00:00Z"
+    assert pr["date_landed"] == "2024-01-02T12:00:00Z"
+    assert pr["target_repository"] == "mozilla/firefox"
+
+
+def test_bug_id_extraction_basic():
+    """Test bug ID extraction from PR title."""
+    test_cases = [
+        ("Bug 1234567 - Fix issue", 1234567),
+        ("bug 1234567: Update code", 1234567),
+        ("Fix for bug 7654321", 7654321),
+        ("b=9876543 - Change behavior", 9876543),
+    ]
+
+    for title, expected_bug_id in test_cases:
+        raw_data = [
+            {
+                "number": 1,
+                "title": title,
+                "state": "open",
+                "labels": [],
+                "commit_data": [],
+                "reviewer_data": [],
+                "comment_data": [],
+            }
+        ]
+
+        result = main.transform_data(raw_data, "mozilla/firefox")
+        assert result["pull_requests"][0]["bug_id"] == expected_bug_id
+
+
+def test_bug_id_extraction_with_hash():
+    """Test bug ID extraction with # symbol."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "Bug #1234567 - Fix issue",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+    assert result["pull_requests"][0]["bug_id"] == 1234567
+
+
+def test_bug_id_filter_large_numbers():
+    """Test that bug IDs >= 100000000 are filtered out."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "Bug 999999999 - Invalid bug ID",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+    assert result["pull_requests"][0]["bug_id"] is None
+
+
+def test_bug_id_no_match():
+    """Test PR title with no bug ID."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "Update documentation",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+    assert result["pull_requests"][0]["bug_id"] is None
+
+
+def test_labels_extraction():
+    """Test labels array extraction."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "PR with labels",
+            "state": "open",
+            "labels": [
+                {"name": "bug"},
+                {"name": "priority-high"},
+                {"name": "needs-review"},
+            ],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+    labels = result["pull_requests"][0]["labels"]
+    assert len(labels) == 3
+    assert "bug" in labels
+    assert "priority-high" in labels
+    assert "needs-review" in labels
+
+
+def test_labels_empty_list():
+    """Test handling empty labels list."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "PR without labels",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+    assert result["pull_requests"][0]["labels"] == []
+
+
+def test_commit_transformation():
+    """Test commit fields mapping."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with commits",
+            "state": "open",
+            "labels": [],
+            "commit_data": [
+                {
+                    "sha": "abc123",
+                    "commit": {
+                        "author": {
+                            "name": "Test Author",
+                            "date": "2024-01-01T12:00:00Z",
+                        }
+                    },
+                    "files": [
+                        {
+                            "filename": "src/main.py",
+                            "additions": 10,
+                            "deletions": 5,
+                        }
+                    ],
+                }
+            ],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["commits"]) == 1
+    commit = result["commits"][0]
+    assert commit["pull_request_id"] == 123
+    assert commit["target_repository"] == "mozilla/firefox"
+    assert commit["commit_sha"] == "abc123"
+    assert commit["date_created"] == "2024-01-01T12:00:00Z"
+    assert commit["author_username"] == "Test Author"
+    assert commit["filename"] == "src/main.py"
+    assert commit["lines_added"] == 10
+    assert commit["lines_removed"] == 5
+
+
+def test_commit_file_flattening():
+    """Test that each file becomes a separate row."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with multiple files",
+            "state": "open",
+            "labels": [],
+            "commit_data": [
+                {
+                    "sha": "abc123",
+                    "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
+                    "files": [
+                        {"filename": "file1.py", "additions": 10, "deletions": 5},
+                        {"filename": "file2.py", "additions": 20, "deletions": 2},
+                        {"filename": "file3.py", "additions": 5, "deletions": 15},
+                    ],
+                }
+            ],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    # Should have 3 rows in commits table (one per file)
+    assert len(result["commits"]) == 3
+    filenames = [c["filename"] for c in result["commits"]]
+    assert "file1.py" in filenames
+    assert "file2.py" in filenames
+    assert "file3.py" in filenames
+
+
+def test_multiple_commits_with_files():
+    """Test multiple commits with multiple files per PR."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with multiple commits",
+            "state": "open",
+            "labels": [],
+            "commit_data": [
+                {
+                    "sha": "commit1",
+                    "commit": {"author": {"name": "Author1", "date": "2024-01-01"}},
+                    "files": [
+                        {"filename": "file1.py", "additions": 10, "deletions": 0}
+                    ],
+                },
+                {
+                    "sha": "commit2",
+                    "commit": {"author": {"name": "Author2", "date": "2024-01-02"}},
+                    "files": [
+                        {"filename": "file2.py", "additions": 5, "deletions": 2},
+                        {"filename": "file3.py", "additions": 8, "deletions": 3},
+                    ],
+                },
+            ],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    # Should have 3 rows total (1 file from commit1, 2 files from commit2)
+    assert len(result["commits"]) == 3
+    assert result["commits"][0]["commit_sha"] == "commit1"
+    assert result["commits"][1]["commit_sha"] == "commit2"
+    assert result["commits"][2]["commit_sha"] == "commit2"
+
+
+def test_reviewer_transformation():
+    """Test reviewer fields mapping."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with reviewers",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [
+                {
+                    "id": 789,
+                    "user": {"login": "reviewer1"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-01T15:00:00Z",
+                }
+            ],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["reviewers"]) == 1
+    reviewer = result["reviewers"][0]
+    assert reviewer["pull_request_id"] == 123
+    assert reviewer["target_repository"] == "mozilla/firefox"
+    assert reviewer["reviewer_username"] == "reviewer1"
+    assert reviewer["status"] == "APPROVED"
+    assert reviewer["date_reviewed"] == "2024-01-01T15:00:00Z"
+
+
+def test_transform_multiple_review_states():
+    """Test transforming data with multiple review states."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with multiple reviews",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [
+                {
+                    "id": 1,
+                    "user": {"login": "user1"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-01T15:00:00Z",
+                },
+                {
+                    "id": 2,
+                    "user": {"login": "user2"},
+                    "state": "CHANGES_REQUESTED",
+                    "submitted_at": "2024-01-01T16:00:00Z",
+                },
+                {
+                    "id": 3,
+                    "user": {"login": "user3"},
+                    "state": "COMMENTED",
+                    "submitted_at": "2024-01-01T17:00:00Z",
+                },
+            ],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["reviewers"]) == 3
+    states = [r["status"] for r in result["reviewers"]]
+    assert "APPROVED" in states
+    assert "CHANGES_REQUESTED" in states
+    assert "COMMENTED" in states
+
+
+def test_date_approved_from_earliest_approval():
+    """Test that date_approved is set to earliest APPROVED review."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with multiple approvals",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [
+                {
+                    "id": 1,
+                    "user": {"login": "user1"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-02T15:00:00Z",
+                },
+                {
+                    "id": 2,
+                    "user": {"login": "user2"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-01T14:00:00Z",  # Earliest
+                },
+                {
+                    "id": 3,
+                    "user": {"login": "user3"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-03T16:00:00Z",
+                },
+            ],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    pr = result["pull_requests"][0]
+    assert pr["date_approved"] == "2024-01-01T14:00:00Z"
+
+
+def test_comment_transformation():
+    """Test comment fields mapping."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR with comments",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [
+                {
+                    "id": 456,
+                    "user": {"login": "commenter1"},
+                    "body": "This looks great!",
+                    "created_at": "2024-01-01T14:00:00Z",
+                    "pull_request_review_id": None,
+                }
+            ],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["comments"]) == 1
+    comment = result["comments"][0]
+    assert comment["pull_request_id"] == 123
+    assert comment["target_repository"] == "mozilla/firefox"
+    assert comment["comment_id"] == 456
+    assert comment["author_username"] == "commenter1"
+    assert comment["date_created"] == "2024-01-01T14:00:00Z"
+    assert comment["character_count"] == 17
+
+
+def test_comment_character_count():
+    """Test character count calculation for comments."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [
+                {
+                    "id": 1,
+                    "user": {"login": "user1"},
+                    "body": "Short",
+                    "created_at": "2024-01-01",
+                },
+                {
+                    "id": 2,
+                    "user": {"login": "user2"},
+                    "body": "This is a much longer comment with more text",
+                    "created_at": "2024-01-01",
+                },
+            ],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert result["comments"][0]["character_count"] == 5
+    assert result["comments"][1]["character_count"] == 44
+
+
+def test_comment_status_from_review():
+    """Test that comment status is mapped from review_id_statuses."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [
+                {
+                    "id": 789,
+                    "user": {"login": "reviewer"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-01",
+                }
+            ],
+            "comment_data": [
+                {
+                    "id": 456,
+                    "user": {"login": "commenter"},
+                    "body": "LGTM",
+                    "created_at": "2024-01-01",
+                    "pull_request_review_id": 789,
+                }
+            ],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    # Comment should have status from the review
+    assert result["comments"][0]["status"] == "APPROVED"
+
+
+def test_comment_empty_body():
+    """Test handling comments with empty or None body."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "PR",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [
+                {
+                    "id": 1,
+                    "user": {"login": "user1"},
+                    "body": None,
+                    "created_at": "2024-01-01",
+                },
+                {
+                    "id": 2,
+                    "user": {"login": "user2"},
+                    "body": "",
+                    "created_at": "2024-01-01",
+                },
+            ],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert result["comments"][0]["character_count"] == 0
+    assert result["comments"][1]["character_count"] == 0
+
+
+def test_empty_raw_data():
+    """Test handling empty input list."""
+    result = main.transform_data([], "mozilla/firefox")
+
+    assert result["pull_requests"] == []
+    assert result["commits"] == []
+    assert result["reviewers"] == []
+    assert result["comments"] == []
+
+
+def test_pr_without_commits_reviewers_comments():
+    """Test PR with no commits, reviewers, or comments."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "Minimal PR",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert len(result["pull_requests"]) == 1
+    assert len(result["commits"]) == 0
+    assert len(result["reviewers"]) == 0
+    assert len(result["comments"]) == 0
+
+
+def test_return_structure():
+    """Test that transform_data returns dict with 4 keys."""
+    raw_data = [
+        {
+            "number": 1,
+            "title": "Test",
+            "state": "open",
+            "labels": [],
+            "commit_data": [],
+            "reviewer_data": [],
+            "comment_data": [],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert isinstance(result, dict)
+    assert "pull_requests" in result
+    assert "commits" in result
+    assert "reviewers" in result
+    assert "comments" in result
+
+
+def test_all_tables_have_target_repository():
+    """Test that all tables include target_repository field."""
+    raw_data = [
+        {
+            "number": 123,
+            "title": "Test PR",
+            "state": "open",
+            "labels": [],
+            "commit_data": [
+                {
+                    "sha": "abc",
+                    "commit": {"author": {"name": "Author", "date": "2024-01-01"}},
+                    "files": [{"filename": "test.py", "additions": 1, "deletions": 0}],
+                }
+            ],
+            "reviewer_data": [
+                {
+                    "id": 1,
+                    "user": {"login": "reviewer"},
+                    "state": "APPROVED",
+                    "submitted_at": "2024-01-01",
+                }
+            ],
+            "comment_data": [
+                {
+                    "id": 2,
+                    "user": {"login": "commenter"},
+                    "body": "Test",
+                    "created_at": "2024-01-01",
+                }
+            ],
+        }
+    ]
+
+    result = main.transform_data(raw_data, "mozilla/firefox")
+
+    assert result["pull_requests"][0]["target_repository"] == "mozilla/firefox"
+    assert result["commits"][0]["target_repository"] == "mozilla/firefox"
+    assert result["reviewers"][0]["target_repository"] == "mozilla/firefox"
+    assert result["comments"][0]["target_repository"] == "mozilla/firefox"

From c4dd862308206ade8cfc980e672c2ec9696e16af Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Fri, 6 Feb 2026 16:43:13 -0500
Subject: [PATCH 10/11] Separate TESTING.md not necessary. Added testing
 section to README.md

---
 README.md  |  91 ++++++++
 TESTING.md | 621 -----------------------------------------------------
 2 files changed, 91 insertions(+), 621 deletions(-)
 delete mode 100644 TESTING.md

diff --git a/README.md b/README.md
index 570bacb..ae10820 100644
--- a/README.md
+++ b/README.md
@@ -157,6 +157,97 @@ This setup includes:
 - **BigQuery Emulator**: Local BigQuery instance for testing
 - **ETL Service**: Configured to use both mock services
 
+### Running Tests
+
+The project includes a comprehensive test suite using pytest. Tests are organized in the `test/` directory and include both unit and integration tests.
+
+#### Setting Up the Development Environment
+
+1. **Install Python 3.14** (or your compatible Python version)
+
+2. **Install development dependencies**:
+
+   ```bash
+   # Install the package with dev dependencies
+   pip install -e ".[dev]"
+   ```
+
+   This installs:
+   - `pytest` - Testing framework
+   - `pytest-mock` - Mocking utilities for tests
+   - `ruff` - Linter
+   - `black` - Code formatter
+
+3. **Verify installation**:
+
+   ```bash
+   pytest --version
+   ```
+
+#### Running the Tests
+
+Run all tests:
+
+```bash
+pytest
+```
+
+Run tests with verbose output:
+
+```bash
+pytest -v
+```
+
+Run specific test files:
+
+```bash
+pytest test/test_extract_pull_requests.py
+pytest test/test_transform_data.py
+```
+
+Run tests by marker:
+
+```bash
+# Run only unit tests
+pytest -m unit
+
+# Run only integration tests
+pytest -m integration
+
+# Skip slow tests
+pytest -m "not slow"
+```
+
+Run tests with coverage reporting:
+
+```bash
+pytest --cov=. --cov-report=html
+```
+
+#### Test Organization
+
+The test suite is organized into the following files:
+
+- `test/conftest.py` - Shared pytest fixtures and test configuration
+- `test/test_extract_pull_requests.py` - Tests for PR extraction logic
+- `test/test_extract_commits.py` - Tests for commit extraction
+- `test/test_extract_comments.py` - Tests for comment extraction
+- `test/test_extract_reviewers.py` - Tests for reviewer extraction
+- `test/test_transform_data.py` - Tests for data transformation
+- `test/test_load_data.py` - Tests for BigQuery loading
+- `test/test_rate_limit.py` - Tests for rate limit handling
+- `test/test_main_integration.py` - End-to-end integration tests
+- `test/test_logging.py` - Tests for logging setup
+- `test/test_formatting.py` - Code formatting tests
+
+#### Test Markers
+
+Tests are marked with the following pytest markers:
+
+- `@pytest.mark.unit` - Unit tests for individual functions
+- `@pytest.mark.integration` - Integration tests across multiple components
+- `@pytest.mark.slow` - Tests that take longer to run
+
 ### Adding Dependencies
 
 Add new Python packages to `requirements.txt` and rebuild the Docker image.
diff --git a/TESTING.md b/TESTING.md
deleted file mode 100644
index 6901d2f..0000000
--- a/TESTING.md
+++ /dev/null
@@ -1,621 +0,0 @@
-# Testing Guide for GitHub ETL
-
-This document describes comprehensive testing for the GitHub ETL pipeline, including
-unit tests, integration tests, Docker testing, linting, and CI/CD workflows.
-
-## Table of Contents
-
-1. [Unit Testing](#unit-testing)
-2. [Test Organization](#test-organization)
-3. [Running Tests](#running-tests)
-4. [Code Coverage](#code-coverage)
-5. [Linting and Code Quality](#linting-and-code-quality)
-6. [CI/CD Integration](#cicd-integration)
-7. [Docker Testing](#docker-testing)
-8. [Adding New Tests](#adding-new-tests)
-
----
-
-## Unit Testing
-
-The test suite in `test_main.py` provides comprehensive coverage for all functions in `main.py`.
-We have unit tests covering 9 functions with 80%+ code coverage requirement.
-
-### Test Structure
-
-Tests are organized into 10 test classes:
-
-1. **TestSetupLogging** - Logging configuration
-2. **TestSleepForRateLimit** - Rate limit handling
-3. **TestExtractPullRequests** - PR extraction with pagination and enrichment
-4. **TestExtractCommits** - Commit and file extraction
-5. **TestExtractReviewers** - Reviewer extraction
-6. **TestExtractComments** - Comment extraction (uses /issues endpoint)
-7. **TestTransformData** - Data transformation for all 4 BigQuery tables
-8. **TestLoadData** - BigQuery data loading
-9. **TestMain** - Main ETL orchestration
-10. **TestIntegration** - End-to-end integration tests (marked with `@pytest.mark.integration`)
-
-### Fixtures
-
-Reusable fixtures are defined at the top of `test_main.py`:
-
-- `mock_session` - Mocked `requests.Session`
-- `mock_bigquery_client` - Mocked BigQuery client
-- `mock_pr_response` - Realistic pull request response
-- `mock_commit_response` - Realistic commit with files
-- `mock_reviewer_response` - Realistic reviewer response
-- `mock_comment_response` - Realistic comment response
-
-## Test Organization
-
-### Function Coverage
-
-| Function |  Coverage Target | Key Test Areas |
-|----------|------------------|----------------|
-| `setup_logging()` | 100% | Logger configuration |
-| `sleep_for_rate_limit()` | 100% | Rate limit sleep logic, edge cases |
-| `extract_pull_requests()` | 90%+ | Pagination, rate limits, enrichment, error handling |
-| `extract_commits()` | 85%+ | Commit/file fetching, rate limits, errors |
-| `extract_reviewers()` | 85%+ | Reviewer states, rate limits, errors |
-| `extract_comments()` | 85%+ | Comment fetching (via /issues), rate limits |
-| `transform_data()` | 95%+ | Bug ID extraction, 4 tables, field mapping |
-| `load_data()` | 90%+ | BigQuery insertion, snapshot dates, errors |
-| `main()` | 85%+ | Env vars, orchestration, chunking |
-
-**Overall Target: 85-90% coverage** (80% minimum enforced in CI)
-
-### Critical Test Cases
-
-#### Bug ID Extraction
-Tests verify the regex pattern matches:
-- `Bug 1234567 - Fix` → 1234567
-- `bug 1234567` → 1234567
-- `b=1234567` → 1234567
-- `Bug #1234567` → 1234567
-- Filters out IDs >= 100000000
-
-#### Data Transformation
-Tests ensure correct transformation for all 4 BigQuery tables:
-- **pull_requests**: PR metadata, bug IDs, labels, date_approved
-- **commits**: Flattened files (one row per file), commit metadata
-- **reviewers**: Review states, date_approved calculation
-- **comments**: Character count, status mapping from reviews
-
-#### Rate Limiting
-Tests verify rate limit handling at all API levels:
-- Pull requests pagination
-- Commit fetching
-- Reviewer fetching
-- Comment fetching
-
-## Running Tests
-
-### All Tests with Coverage
-
-```bash
-pytest
-```
-
-This runs all tests with coverage reporting (configured in `pytest.ini`).
-
-### Fast Unit Tests Only (Skip Integration)
-
-```bash
-pytest -m "not integration and not slow"
-```
-
-Use this for fast feedback during development.
-
-### Specific Test Class
-
-```bash
-pytest test_main.py::TestTransformData
-```
-
-### Specific Test Function
-
-```bash
-pytest test_main.py::TestTransformData::test_bug_id_extraction_basic -v
-```
-
-### With Verbose Output
-
-```bash
-pytest -v
-```
-
-### With Coverage Report
-
-```bash
-# Terminal report
-pytest --cov=main --cov-report=term-missing
-
-# HTML report
-pytest --cov=main --cov-report=html
-open htmlcov/index.html
-```
-
-### Integration Tests Only
-
-```bash
-pytest -m integration
-```
-
-## Code Coverage
-
-### Coverage Requirements
-
-- **Minimum**: 80% (enforced in CI via `--cov-fail-under=80`)
-- **Target**: 85-90%
-- **Current**: Run `pytest --cov=main` to see current coverage
-
-### Coverage Configuration
-
-Coverage settings are in `pytest.ini`:
-
-```ini
-[pytest]
-addopts =
-    --cov=main
-    --cov-report=term-missing
-    --cov-report=html
-    --cov-branch
-    --cov-fail-under=80
-```
-
-### Viewing Coverage
-
-```bash
-# Generate HTML coverage report
-pytest --cov=main --cov-report=html
-
-# Open in browser
-xdg-open htmlcov/index.html  # Linux
-open htmlcov/index.html      # macOS
-```
-
-The HTML report shows:
-- Line-by-line coverage
-- Branch coverage
-- Missing lines highlighted
-- Per-file coverage percentages
-
-## Linting and Code Quality
-
-### Available Linters
-
-The project uses these linting tools (defined in `requirements.txt`):
-
-- **black** - Code formatting
-- **isort** - Import sorting
-- **flake8** - Style and syntax checking
-- **mypy** - Static type checking
-
-### Running Linters
-
-```bash
-# Run black (auto-format)
-black main.py test_main.py
-
-# Check formatting without changes
-black --check main.py test_main.py
-
-# Sort imports
-isort main.py test_main.py
-
-# Check import sorting
-isort --check-only main.py test_main.py
-
-# Run flake8
-flake8 main.py test_main.py --max-line-length=100 --extend-ignore=E203,W503
-
-# Run mypy
-mypy main.py --no-strict-optional --ignore-missing-imports
-```
-
-### All Linting Checks
-
-```bash
-# Run all linters in sequence
-black --check main.py test_main.py && \
-isort --check-only main.py test_main.py && \
-flake8 main.py test_main.py --max-line-length=100 --extend-ignore=E203,W503 && \
-mypy main.py --no-strict-optional --ignore-missing-imports
-```
-
-## CI/CD Integration
-
-### GitHub Actions Workflow
-
-The `.github/workflows/tests.yml` workflow runs on every pull request:
-
-**Lint Job:**
-1. Runs black (format check)
-2. Runs isort (import check)
-3. Runs flake8 (style check)
-4. Runs mypy (type check)
-
-**Test Job:**
-1. Runs fast unit tests with 80% coverage threshold
-2. Runs all tests (including integration)
-3. Uploads coverage reports as artifacts
-
-### Workflow Triggers
-
-- Pull requests to `main` branch
-
-### Viewing Results
-
-- Check the Actions tab in GitHub
-- Coverage artifacts are uploaded for each run
-- Failed linting or tests will block merges
-
-## Docker Testing
-
-## Overview
-
-The `docker-compose.yml` configuration provides a complete local testing environment with:
-
-1. **Mock GitHub API** - A Flask-based mock service that simulates the GitHub Pull Requests API
-2. **BigQuery Emulator** - A local BigQuery instance for testing data loads
-3. **ETL Service** - The main GitHub ETL application configured to use the mock services
-
-## Quick Start
-
-### Start all services
-
-```bash
-docker-compose up --build
-```
-
-This will:
-
-- Build and start the mock GitHub API (port 5000)
-- Start the BigQuery emulator (ports 9050, 9060)
-- Build and run the ETL service
-
-The ETL service will automatically:
-
-- Fetch 250 mock pull requests from the mock GitHub API
-- Transform the data
-- Load it into the BigQuery emulator
-
-### View logs
-
-```bash
-# All services
-docker-compose logs -f
-
-# Specific service
-docker-compose logs -f github-etl
-docker-compose logs -f bigquery-emulator
-docker-compose logs -f mock-github-api
-```
-
-### Stop services
-
-```bash
-docker-compose down
-```
-
-## Architecture
-
-### Mock GitHub API Service
-
-- **Port**: 5000
-- **Endpoint**: `http://localhost:5000/repos/{owner}/{repo}/pulls`
-- **Mock data**: Generates 250 sample pull requests with realistic data
-- **Features**:
-  - Pagination support (per_page, page parameters)
-  - Realistic PR data (numbers, titles, states, timestamps, users, etc.)
-  - Mock rate limit headers
-  - No authentication required
-
-### BigQuery Emulator Service
-
-- **Ports**:
-  - 9050 (BigQuery API)
-  - 9060 (Discovery/Admin API)
-- **Configuration**: Uses `data.yml` to define the schema
-- **Project**: test
-- **Dataset**: github_etl
-- **Table**: pull_requests
-
-### ETL Service
-
-The ETL service is configured via environment variables in `docker-compose.yml`:
-
-```yaml
-environment:
-  GITHUB_REPOS: "mozilla-firefox/firefox"
-  GITHUB_TOKEN: ""  # Not needed for mock API
-  GITHUB_API_URL: "http://mock-github-api:5000"
-  BIGQUERY_PROJECT: "test"
-  BIGQUERY_DATASET: "github_etl"
-  BIGQUERY_EMULATOR_HOST: "http://bigquery-emulator:9050"
-```
-
-## Customization
-
-### Using Real GitHub API
-
-To test with the real GitHub API instead of the mock:
-
-1. Set `GITHUB_TOKEN` environment variable
-2. Remove or comment out `GITHUB_API_URL` in docker-compose.yml
-3. Update `depends_on` to not require mock-github-api
-
-```bash
-export GITHUB_TOKEN="your_github_token"
-docker-compose up github-etl bigquery-emulator
-```
-
-### Adjusting Mock Data
-
-Edit `mock_github_api.py` to customize:
-
-- Total number of PRs (default: 250)
-- PR field values
-- Pagination behavior
-
-### Modifying BigQuery Schema
-
-Edit `data.yml` to change the table schema. The schema matches the fields
-extracted in `main.py`'s `transform_data()` function.
-
-## Querying the BigQuery Emulator
-
-You can query the BigQuery emulator using the BigQuery Python client:
-
-```python
-from google.cloud import bigquery
-from google.api_core.client_options import ClientOptions
-
-client = bigquery.Client(
-    project="test-project",
-    client_options=ClientOptions(api_endpoint="http://localhost:9050")
-)
-
-query = """
-SELECT pr_number, title, state, user_login
-FROM `test-project.test_dataset.pull_requests`
-LIMIT 10
-"""
-
-for row in client.query(query):
-    print(f"PR #{row.pr_number}: {row.title} - {row.state}")
-```
-
-Or use the `bq` command-line tool with the emulator endpoint.
-
-## Troubleshooting
-
-### Services not starting
-
-Check if ports are already in use:
-
-```bash
-lsof -i :5000  # Mock GitHub API
-lsof -i :9050  # BigQuery emulator
-```
-
-### ETL fails to connect
-
-Ensure services are healthy:
-
-```bash
-docker-compose ps
-```
-
-Check service logs:
-
-```bash
-docker-compose logs bigquery-emulator
-docker-compose logs mock-github-api
-```
-
-### Schema mismatch errors
-
-Verify `data.yml` schema matches fields in `main.py:transform_data()`.
-
-## Development Workflow
-
-1. Make changes to `main.py`
-2. Restart the ETL service: `docker-compose restart github-etl`
-3. View logs: `docker-compose logs -f github-etl`
-
-The `main.py` file is mounted as a volume, so changes are reflected without rebuilding.
-
-## Cleanup
-
-Remove all containers and volumes:
-
-```bash
-docker-compose down -v
-```
-
-Remove built images:
-
-```bash
-docker-compose down --rmi all
-```
-
----
-
-## Adding New Tests
-
-### Testing Patterns
-
-#### 1. Mock External Dependencies
-
-Always mock external API calls and BigQuery operations:
-
-```python
-@patch("requests.Session")
-def test_api_call(mock_session_class):
-    mock_session = MagicMock()
-    mock_session_class.return_value = mock_session
-
-    mock_response = Mock()
-    mock_response.status_code = 200
-    mock_response.json.return_value = [{"id": 1}]
-
-    mock_session.get.return_value = mock_response
-    # Test code here
-```
-
-#### 2. Use Fixtures
-
-Leverage existing fixtures for common test data:
-
-```python
-def test_with_fixtures(mock_session, mock_pr_response):
-    # Use mock_session and mock_pr_response
-    pass
-```
-
-#### 3. Test Edge Cases
-
-Always test:
-- Empty inputs
-- None values
-- Missing fields
-- Rate limits
-- API errors (404, 500, etc.)
-- Boundary conditions
-
-#### 4. Verify Call Arguments
-
-Check that functions are called with correct parameters:
-
-```python
-mock_extract.assert_called_once_with(
-    session=mock_session,
-    repo="mozilla/firefox",
-    github_api_url="https://api.github.com"
-)
-```
-
-### Example: Adding a New Test
-
-```python
-class TestNewFunction:
-    """Tests for new_function."""
-
-    def test_basic_functionality(self, mock_session):
-        """Test basic happy path."""
-        # Arrange
-        mock_response = Mock()
-        mock_response.status_code = 200
-        mock_response.json.return_value = {"result": "success"}
-        mock_session.get.return_value = mock_response
-
-        # Act
-        result = main.new_function(mock_session, "arg1")
-
-        # Assert
-        assert result == {"result": "success"}
-        mock_session.get.assert_called_once()
-
-    def test_error_handling(self, mock_session):
-        """Test error handling."""
-        mock_response = Mock()
-        mock_response.status_code = 500
-        mock_response.text = "Internal Error"
-        mock_session.get.return_value = mock_response
-
-        with pytest.raises(SystemExit) as exc_info:
-            main.new_function(mock_session, "arg1")
-
-        assert "500" in str(exc_info.value)
-```
-
-### Test Organization Guidelines
-
-1. **Group related tests** in test classes
-2. **Use descriptive names** like `test_handles_rate_limit_on_commits`
-3. **One assertion concept per test** - Test one thing at a time
-4. **Arrange-Act-Assert pattern** - Structure tests clearly
-5. **Add docstrings** to explain what each test verifies
-
-### Mocking Patterns
-
-#### Mocking Time
-
-```python
-@patch("time.time")
-@patch("time.sleep")
-def test_with_time(mock_sleep, mock_time):
-    mock_time.return_value = 1000
-    # Test code
-```
-
-#### Mocking Environment Variables
-
-```python
-with patch.dict(os.environ, {"VAR_NAME": "value"}, clear=True):
-    # Test code
-```
-
-#### Mocking Generators
-
-```python
-mock_extract.return_value = iter([[{"id": 1}], [{"id": 2}]])
-```
-
-### Running Tests During Development
-
-```bash
-# Auto-run tests on file changes (requires pytest-watch)
-pip install pytest-watch
-ptw -- --cov=main -m "not integration"
-```
-
-### Debugging Tests
-
-```bash
-# Drop into debugger on failures
-pytest --pdb
-
-# Show print statements
-pytest -s
-
-# Verbose with full diff
-pytest -vv
-```
-
-### Coverage Tips
-
-If coverage is below 80%:
-
-1. Run `pytest --cov=main --cov-report=term-missing` to see missing lines
-2. Look for untested branches (if/else paths)
-3. Check error handling paths
-4. Verify edge cases are covered
-
-## Resources
-
-- [pytest documentation](https://docs.pytest.org/)
-- [pytest-cov documentation](https://pytest-cov.readthedocs.io/)
-- [unittest.mock documentation](https://docs.python.org/3/library/unittest.mock.html)
-
-## Troubleshooting
-
-### Tests Pass Locally But Fail in CI
-
-- Check Python version (must be 3.14)
-- Verify all dependencies are in `requirements.txt`
-- Look for environment-specific issues
-
-### Coverage Dropped Below 80%
-
-- Run locally: `pytest --cov=main --cov-report=html`
-- Open `htmlcov/index.html` to see uncovered lines
-- Add tests for missing coverage
-
-### Import Errors
-
-- Ensure `PYTHONPATH` includes project root
-- Check that `__init__.py` files exist if needed
-- Verify module names match file names

From 48c1c46e1a2b139752ab97d6d5dd8157dcb13c6c Mon Sep 17 00:00:00 2001
From: David Lawrence <dkl@mozilla.com>
Date: Fri, 6 Feb 2026 17:32:26 -0500
Subject: [PATCH 11/11] Copoilot suggested fixes

---
 .github/workflows/tests.yml | 13 ++++++-------
 README.md                   | 26 +++++++++++++-------------
 test_formatting.py          | 16 ----------------
 3 files changed, 19 insertions(+), 36 deletions(-)
 delete mode 100644 test_formatting.py

diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
index b4cc85b..4025084 100644
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -23,10 +23,9 @@ jobs:
   integration-test:
     runs-on: ubuntu-latest
     steps:
-    - uses: actions/checkout@v4
-    - name: Run integration test with docker compose
-      run: |
-        docker compose up --build --abort-on-container-exit --exit-code-from github-etl
-    - name: Cleanup
-      if: always()
-      run: docker compose down -v
+      - uses: actions/checkout@v4
+      - name: Run integration test with docker compose
+        run: |
+          docker compose up --build --abort-on-container-exit --exit-code-from github-etl
+      - name: Cleanup
+        run: docker compose down -v
diff --git a/README.md b/README.md
index ae10820..d27188b 100644
--- a/README.md
+++ b/README.md
@@ -201,8 +201,8 @@ pytest -v
 Run specific test files:
 
 ```bash
-pytest test/test_extract_pull_requests.py
-pytest test/test_transform_data.py
+pytest tests/test_extract_pull_requests.py
+pytest tests/test_transform_data.py
 ```
 
 Run tests by marker:
@@ -228,17 +228,17 @@ pytest --cov=. --cov-report=html
 
 The test suite is organized into the following files:
 
-- `test/conftest.py` - Shared pytest fixtures and test configuration
-- `test/test_extract_pull_requests.py` - Tests for PR extraction logic
-- `test/test_extract_commits.py` - Tests for commit extraction
-- `test/test_extract_comments.py` - Tests for comment extraction
-- `test/test_extract_reviewers.py` - Tests for reviewer extraction
-- `test/test_transform_data.py` - Tests for data transformation
-- `test/test_load_data.py` - Tests for BigQuery loading
-- `test/test_rate_limit.py` - Tests for rate limit handling
-- `test/test_main_integration.py` - End-to-end integration tests
-- `test/test_logging.py` - Tests for logging setup
-- `test/test_formatting.py` - Code formatting tests
+- `tests/conftest.py` - Shared pytest fixtures and test configuration
+- `tests/test_extract_pull_requests.py` - Tests for PR extraction logic
+- `tests/test_extract_commits.py` - Tests for commit extraction
+- `tests/test_extract_comments.py` - Tests for comment extraction
+- `tests/test_extract_reviewers.py` - Tests for reviewer extraction
+- `tests/test_transform_data.py` - Tests for data transformation
+- `tests/test_load_data.py` - Tests for BigQuery loading
+- `tests/test_rate_limit.py` - Tests for rate limit handling
+- `tests/test_main_integration.py` - End-to-end integration tests
+- `tests/test_logging.py` - Tests for logging setup
+- `tests/test_formatting.py` - Code formatting tests
 
 #### Test Markers
 
diff --git a/test_formatting.py b/test_formatting.py
deleted file mode 100644
index c92e534..0000000
--- a/test_formatting.py
+++ /dev/null
@@ -1,16 +0,0 @@
-"""
-Code Style Tests.
-"""
-
-import subprocess
-
-
-def test_black():
-    cmd = ("black", "--diff", "main.py")
-    output = subprocess.check_output(cmd)
-    assert not output, "The python code does not adhere to the project style."
-
-
-def test_ruff():
-    passed = subprocess.call(("ruff", "check", "main.py", "--target-version", "py314"))
-    assert not passed, "ruff did not run cleanly."