feat: Added comprehensive unit testing and github action to run tests on new pull requests by dklawren · Pull Request #8 · mozilla-conduit/github-etl

dklawren · 2026-01-21T22:27:43Z

No description provided.

Copilot

Pull request overview

Adds a comprehensive pytest-based test suite and GitHub Actions workflow to automatically run linting, unit tests, and integration tests for the GitHub ETL pipeline.

Changes:

Added a large pytest test suite covering main.py functions (extraction, transformation, loading, orchestration).
Added pytest/coverage configuration and a testing guide documenting local + CI workflows.
Added a GitHub Actions workflow to run linting/tests on pull requests; expanded dependencies to include lint/format tools.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`test_main.py`	New comprehensive unit/integration test suite for `main.py`.
`requirements.txt`	Adds dev tooling dependencies (black/flake8/mypy/isort) alongside existing test deps.
`pytest.ini`	Configures pytest discovery, verbosity, and coverage reporting.
`TESTING.md`	Documents how to run tests/linting locally and in CI, plus docker-based integration testing.
`.github/workflows/tests.yml`	Adds CI workflow for linting, pytest runs (unit + all), coverage artifacts, and docker-compose integration job.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test_main.py

pytest.ini

TESTING.md

test_main.py

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

main.py

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

main.py:536

GITHUB_REPOS is split on commas but entries aren’t stripped/validated. Values like "owner/repo, owner/repo" (note the space) or a trailing comma will produce repo strings with leading whitespace or empty entries, which will break API URLs. Consider stripping whitespace and filtering out empty repo names before iterating.

    github_repos = []
    github_repos_str = os.getenv("GITHUB_REPOS")
    if github_repos_str:
        github_repos = github_repos_str.split(",")
    else:
        raise SystemExit(
            "Environment variable GITHUB_REPOS is required (format: 'owner/repo,owner/repo')"
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TESTING.md

.github/workflows/tests.yml

pytest.ini

TESTING.md

cgsheeh

I haven't looked over the tests in full yet, but wanted to leave comments about the higher-level items I noticed off the bat.

.github/workflows/tests.yml

test_main.py

cgsheeh

There is a lot of content in this PR, it might take a few iterations to review it fully.

We could land some of this code faster if we split it into smaller chunks. For example, we could add linting support and tests in one PR, and deal with the tests themselves in another.

pyproject.toml

test_formatting.py

test_main.py

tests/test_main.py

TESTING.md

dklawren · 2026-02-02T04:35:44Z

There is a lot of content in this PR, it might take a few iterations to review it fully.

We could land some of this code faster if we split it into smaller chunks. For example, we could add linting support and tests in one PR, and deal with the tests themselves in another.

Good idea and sorry for so much in one PR. I create #9 just now which only includes the github action and formatting parts this PR and we can land it first and then add the unit tests in a separate PR.

… on new pull requests

.github/workflows/tests.yml

- Broke up all of the tests into individual files based on function to make for easier review.

.github/workflows/tests.yml

Copilot

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T21:54:36Z

.github/workflows/tests.yml

+    - uses: actions/checkout@v4
+    - name: Run integration test with docker compose
+      run: |
+        docker compose up --build --abort-on-container-exit --exit-code-from github-etl
+    - name: Cleanup
+      if: always()
+      run: docker compose down -v


integration-test job YAML is invalid: the list items under steps: are not indented, so GitHub Actions will treat them as siblings of steps (workflow parse error). Indent the - uses: / - name: entries under steps: by two spaces.

Suggested change

- uses: actions/checkout@v4

- name: Run integration test with docker compose

run: |

docker compose up --build --abort-on-container-exit --exit-code-from github-etl

- name: Cleanup

if: always()

run: docker compose down -v

- uses: actions/checkout@v4

- name: Run integration test with docker compose

run: |

docker compose up --build --abort-on-container-exit --exit-code-from github-etl

- name: Cleanup

if: always()

run: docker compose down -v

Copilot · 2026-02-06T21:54:37Z

README.md

+### Running Tests
+
+The project includes a comprehensive test suite using pytest. Tests are organized in the `test/` directory and include both unit and integration tests.
+
+#### Setting Up the Development Environment
+
+1. **Install Python 3.14** (or your compatible Python version)
+
+2. **Install development dependencies**:
+
+   ```bash
+   # Install the package with dev dependencies
+   pip install -e ".[dev]"
+   ```
+
+   This installs:
+   - `pytest` - Testing framework
+   - `pytest-mock` - Mocking utilities for tests
+   - `ruff` - Linter
+   - `black` - Code formatter
+
+3. **Verify installation**:
+
+   ```bash
+   pytest --version
+   ```
+
+#### Running the Tests
+
+Run all tests:
+
+```bash
+pytest
+```
+
+Run tests with verbose output:
+
+```bash
+pytest -v
+```
+
+Run specific test files:
+
+```bash
+pytest test/test_extract_pull_requests.py
+pytest test/test_transform_data.py
+```
+


This section refers to a test/ directory and paths like pytest test/test_extract_pull_requests.py, but the repository’s tests live under tests/ (and pytest is configured with testpaths = ["tests"]). Update the docs and example commands/paths to tests/... so they work as written.

Copilot · 2026-02-06T21:54:37Z

test_formatting.py

+def test_black():
+    cmd = ("black", "--diff", "main.py")
+    output = subprocess.check_output(cmd)
+    assert not output, "The python code does not adhere to the project style."
+
+
+def test_ruff():


This file won’t be collected by pytest because pyproject.toml sets testpaths = ["tests"], so top-level test_formatting.py is ignored. Either move/delete it, or adjust pytest discovery; also consider de-duplicating with tests/test_formatting.py to avoid maintaining two different formatting-test implementations.

Suggested change

def test_black():

cmd = ("black", "--diff", "main.py")

output = subprocess.check_output(cmd)

assert not output, "The python code does not adhere to the project style."

def test_ruff():

def check_black():

cmd = ("black", "--diff", "main.py")

output = subprocess.check_output(cmd)

assert not output, "The python code does not adhere to the project style."

def check_ruff():

.github/workflows/tests.yml

+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run integration test with docker compose
+        run: |
+          docker compose up --build --abort-on-container-exit --exit-code-from github-etl
+      - name: Cleanup
+        run: docker compose down -v


cgsheeh

This PR has the right intent, but it has execution issues that come down to common pitfalls when using AI tools.

The main issue is tests which test nothing. Many of the tests in this PR follow this pattern:

 mock_session.get.return_value.json.return_value = [{"id": 456}]
 result = main.extract_comments(mock_session, ...)
 assert result[0]["id"] == 456  # just reading back our own mock value

Since the method we're testing (extract_comments in this example) simply returns the value from the mock, the assertion will never catch a regression in our code. We need to ensure that mocking the method causes the function to go through behaviour we wish to test, and that we assert an appropriate value such that we would catch a regression.

There are also references to things which don't exist, overly verbose generated docs, dead code, etc.

Some suggestions I would make for the future:

Use Claude/Codex in the terminal instead of the Github Copilot integration, local terminal-based agents provide much better results.
Go slowly with the AI agent, ask it to "make as few changes as possible", and verify changes it makes.
Ask the agent to do a code review before submitting. In this case you could ask it to verify if there were missing test coverage of important functionality, if the tests could be reduced or simplified in any way, or if the tests were correct.

cgsheeh · 2026-02-10T21:22:25Z

requirements.txt

@@ -1,5 +1,5 @@
 #
-# This file is autogenerated by pip-compile with Python 3.14
+# This file is autogenerated by pip-compile with Python 3.10


The Dockerfile indicates we're using Python 3.14, but these requirements were generated with Python 3.10. We'll probably need to re-generate requirements.

cgsheeh · 2026-02-10T21:24:06Z

README.md

+
+The project includes a comprehensive test suite using pytest. Tests are organized in the `test/` directory and include both unit and integration tests.
+
+#### Setting Up the Development Environment


The Dockerfile should do all of this for us. We should update these docs to demonstrate how to set up the environment and run the tests using Docker.

cgsheeh · 2026-02-10T22:25:18Z

tests/test_extract_comments.py

+
+
+def test_uses_issues_endpoint(mock_session):
+    """Test that comments use /issues endpoint not /pulls."""


Why do we need to test that a specific endpoint is used here?

cgsheeh · 2026-02-10T22:28:31Z

tests/test_extract_comments.py

+
+    mock_session.get.return_value = comments_response
+
+    result = main.extract_comments(mock_session, "mozilla/firefox", 123)


extract_comments does nothing but make a request to Github and return the parsed JSON. Mocking out the return values on session.get means we are effectively testing the mock.

cgsheeh · 2026-02-10T22:29:36Z

tests/test_extract_comments.py

+        for i in range(1, 11)
+    ]
+
+    mock_session.get.return_value = comments_response


Same issue. This test is useless as the method is supposed to make an API call and return the result, and we're just mocking that out and asserting our mocked value was returned.

cgsheeh · 2026-02-11T00:08:42Z

tests/test_logging.py

+import main
+
+
+def test_setup_logging():


This is a low-value test, you can remove this file entirely.

As an aside, looking at the code while reviewing this I noticed we define the logger at the function level in many places. We should instead have that line only once at the top of the file, like this.

cgsheeh · 2026-02-11T00:11:07Z

tests/test_main_integration.py

+        # Verify BigQuery client was created with emulator settings
+        mock_bq_client_class.assert_called_once()


This just tests the mock was called, we should either check something more specific or remove this.

cgsheeh · 2026-02-11T00:12:25Z

tests/test_main_integration.py

+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_creates_session_with_headers(


This looks like it tests the mock, we should either confirm it tests something functional or remove.

cgsheeh · 2026-02-11T00:12:44Z

tests/test_main_integration.py

+@patch("main.setup_logging")
+@patch("main.bigquery.Client")
+@patch("requests.Session")
+def test_sets_authorization_header_with_token(


Same here, testing mock interactions.

cgsheeh · 2026-02-11T00:13:14Z

README.md


+### Running Tests
+
+The project includes a comprehensive test suite using pytest. Tests are organized in the `test/` directory and include both unit and integration tests.


Suggested change

The project includes a comprehensive test suite using pytest. Tests are organized in the `test/` directory and include both unit and integration tests.

The project includes a comprehensive test suite using pytest. Tests are organized in the `tests/` directory and include both unit and integration tests.

dklawren · 2026-02-11T01:04:31Z

This PR has the right intent, but it has execution issues that come down to common pitfalls when using AI tools.

The main issue is tests which test nothing. Many of the tests in this PR follow this pattern:
 mock_session.get.return_value.json.return_value = [{"id": 456}]
 result = main.extract_comments(mock_session, ...)
 assert result[0]["id"] == 456  # just reading back our own mock value
Since the method we're testing (extract_comments in this example) simply returns the value from the mock, the assertion will never catch a regression in our code. We need to ensure that mocking the method causes the function to go through behaviour we wish to test, and that we assert an appropriate value such that we would catch a regression.

There are also references to things which don't exist, overly verbose generated docs, dead code, etc.

Some suggestions I would make for the future:
* Use Claude/Codex in the terminal instead of the Github Copilot integration, local terminal-based agents provide much better results.

* Go slowly with the AI agent, ask it to "make as few changes as possible", and verify changes it makes.

* Ask the agent to do a code review before submitting. In this case you could ask it to verify if there were missing test coverage of important functionality, if the tests could be reduced or simplified in any way, or if the tests were correct.

Thanks for taking the time for this review. I am still trying to get my head around the use of AI to help with some of the more mundane tasks such as generating tests, etc. Especially I need to learn more about how much to depend on the tool and when not to go to far with it. I will go through the tests more thoroughly and also address your change suggestions. For full disclosure this was all done (as with my other pull requests) using our Mozilla account with Claude Code. I only use Github Copilot for first round reviews to catch more of my glaring mistakes before bring you in as a reviewer. Thanks again and I will get this fixed up in the next day or so,

dkl

dklawren requested a review from Copilot January 21, 2026 22:27

Copilot started reviewing on behalf of dklawren January 21, 2026 22:28 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

dklawren requested a review from Copilot January 22, 2026 02:55

Copilot started reviewing on behalf of dklawren January 22, 2026 02:55 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

main.py Outdated Show resolved Hide resolved

dklawren requested review from cgsheeh and Copilot January 22, 2026 03:12

Copilot started reviewing on behalf of dklawren January 22, 2026 03:12 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

cgsheeh requested changes Jan 23, 2026

View reviewed changes

.github/workflows/tests.yml Outdated Show resolved Hide resolved

.github/workflows/tests.yml Outdated Show resolved Hide resolved

test_main.py Outdated Show resolved Hide resolved

dklawren requested a review from cgsheeh January 23, 2026 23:22

cgsheeh requested changes Jan 29, 2026

View reviewed changes

dklawren added a commit that referenced this pull request Feb 2, 2026

Review comments from pull request #8

f5dd01a

dklawren added 8 commits February 4, 2026 18:02

feat: Added comprehensive unit testing and github action to run tests…

1791608

… on new pull requests

Copilot suggested fixes

d6cb74c

Black formatted

5836a84

Used isort to fix sorting order

76f54f3

Mypy test fixes

9c288cc

Copilot fixes

b95c05f

Fixed review comments

8b7eb48

Added conftest.py and moved tests to test/ directory

4bb878e

dklawren force-pushed the unit-tests branch from cb582d3 to 4bb878e Compare February 4, 2026 23:14

github-advanced-security bot found potential problems Feb 4, 2026

View reviewed changes

.github/workflows/tests.yml Fixed Show fixed Hide fixed

- Fixed integration test gitub action to use docker compose properly.

e3647c4

- Broke up all of the tests into individual files based on function to make for easier review.

github-advanced-security bot found potential problems Feb 6, 2026

View reviewed changes

.github/workflows/tests.yml Fixed Show fixed Hide fixed

Separate TESTING.md not necessary. Added testing section to README.md

c4dd862

dklawren requested review from cgsheeh and Copilot February 6, 2026 21:50

Copilot started reviewing on behalf of dklawren February 6, 2026 21:51 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

Copoilot suggested fixes

48c1c46

github-advanced-security bot found potential problems Feb 6, 2026

View reviewed changes

cgsheeh requested changes Feb 11, 2026

View reviewed changes


		The project includes a comprehensive test suite using pytest. Tests are organized in the `test/` directory and include both unit and integration tests.

		#### Setting Up the Development Environment



		def test_uses_issues_endpoint(mock_session):
		"""Test that comments use /issues endpoint not /pulls."""


		mock_session.get.return_value = comments_response

		result = main.extract_comments(mock_session, "mozilla/firefox", 123)

		# Verify BigQuery client was created with emulator settings
		mock_bq_client_class.assert_called_once()


		### Running Tests

		The project includes a comprehensive test suite using pytest. Tests are organized in the `test/` directory and include both unit and integration tests.

Conversation

dklawren commented Jan 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cgsheeh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cgsheeh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dklawren commented Feb 2, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Check warning

cgsheeh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment