fix(task_database): Use stable SHA1 hash for task ID generation by alebmorais · Pull Request #71 · alebmorais/pi_productivity

alebmorais · 2025-10-28T14:55:54Z

Replaced the use of Python's built-in hash() with hashlib.sha1 to generate a deterministic task_id for tasks without an explicit ID.

The built-in hash() is not stable across different Python processes, which could lead to duplicate tasks being created when the application is restarted. This change ensures that the same task payload will always produce the same task_id, preventing data duplication.

A new test case has been added to tests/test_task_database.py to verify that the task ID generation is stable and predictable.

Summary by CodeRabbit

Bug Fixes
- Task IDs are now generated consistently and deterministically for improved stability.
Tests
- Added unit tests to validate stable task ID generation behavior.

Replaced the use of Python's built-in `hash()` with `hashlib.sha1` to generate a deterministic `task_id` for tasks without an explicit ID. The built-in `hash()` is not stable across different Python processes, which could lead to duplicate tasks being created when the application is restarted. This change ensures that the same task payload will always produce the same `task_id`, preventing data duplication. A new test case has been added to `tests/test_task_database.py` to verify that the task ID generation is stable and predictable.

gemini-code-assist · 2025-10-28T14:56:07Z

Summary of Changes

Hello @alebmorais, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where task IDs were not consistently generated across different Python processes, potentially leading to data duplication. By switching to a stable SHA1 hashing mechanism for task ID generation, the system now ensures that the same task payload will always produce the same identifier. A new test has been introduced to validate this improved determinism.

Highlights

Deterministic Task ID Generation: Replaced Python's built-in hash() function with hashlib.sha1 to generate stable and deterministic task_ids for tasks without an explicit ID. This prevents duplicate tasks from being created when the application restarts.
New Test Case: Added a new test file, tests/test_task_database.py, with a test case to verify that the task ID generation is now stable and predictable using the SHA1 hash.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-10-28T14:56:08Z

Walkthrough

The _normalise_task function in task_database.py is updated to use deterministic SHA1 hashing for generating task IDs when none exists in the payload, replacing the previous built-in hash function. A new unit test validates that this hashing produces stable, repeatable IDs.

Changes

Cohort / File(s)	Summary
Core Implementation `task_database.py`	Modified `_normalise_task` to compute SHA1 hash of sorted JSON payload as fallback task_id instead of using Python's built-in hash function; ensures stable, repeatable IDs.
Test Coverage `tests/test_task_database.py`	Added new test file with `TestTaskDatabase` class; includes `test_normalise_task_generates_stable_id` test that validates SHA1-based task_id generation produces expected stable hash ("a5b0c8d49a4c19a78200530eccd3caf4f859e5b5").

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Verify SHA1 import is correctly included and accessible
Validate hash computation logic against the precomputed expected value in the test
Confirm JSON payload serialization (with sort_keys=True) produces consistent results
Check that the string conversion of the hash digest is properly formatted
Ensure test assertion covers the documented contract for stable ID generation

Poem

🐰 A hash so stable, a puzzle so fine,
SHA1 ensures the IDs align,
No more chaos from Python's old way,
Our tasks now sleep soundly, hooray, hooray! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "fix(task_database): Use stable SHA1 hash for task ID generation" directly and accurately summarizes the main change in this changeset. The title correctly identifies the module (task_database), the nature of the change (replacing hash with SHA1), and the primary objective (ensuring stable task ID generation). The title is concise, specific, and uses clear language without vague terminology or unnecessary noise. A teammate reviewing the commit history would immediately understand that this change addresses task ID stability by switching to a deterministic hashing approach.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-stable-task-id

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request correctly addresses a potential data duplication issue by replacing Python's non-deterministic built-in hash() with hashlib.sha1 for task ID generation. This ensures that task IDs are stable across application restarts. The inclusion of a new test case to validate this behavior is a good practice. My review includes a suggestion to use a more modern hashing algorithm for better future-proofing and a minor code cleanup in the new test file.

gemini-code-assist · 2025-10-28T14:57:20Z

task_database.py

+            # Bug fix: Use a stable SHA1 hash of the JSON payload
+            # to ensure the ID is deterministic.
+            payload_bytes = json.dumps(payload, sort_keys=True).encode("utf-8")
+            task_id = hashlib.sha1(payload_bytes).hexdigest()


While SHA1 is sufficient for generating a unique ID in this context, it is considered a legacy algorithm with known theoretical weaknesses. For better future-proofing and to align with current cryptographic best practices, consider using a more modern hash function from the SHA-2 family, such as SHA-256. This change would also require updating the expected hash value in the corresponding test.

Suggested change

task_id = hashlib.sha1(payload_bytes).hexdigest()

task_id = hashlib.sha256(payload_bytes).hexdigest()

gemini-code-assist · 2025-10-28T14:57:21Z

tests/test_task_database.py

@@ -0,0 +1,29 @@
+import unittest
+import json


The json module is imported but not used in this file. Removing unused imports is recommended by PEP 8 to keep the code clean and avoid confusion.¹

Style Guide References

Footnotes

PEP 8 recommends removing unused imports to improve code readability and maintainability. This is often flagged by linters with codes like F401. ↩

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/test_task_database.py (1)

10-26: Good test validating stable ID generation.

The test correctly verifies that _normalise_task produces a deterministic SHA1-based task_id for payloads without an explicit ID. The expected hash value appears to have been verified against the actual implementation.

Consider adding a few more test cases to strengthen coverage:

def test_normalise_task_idempotency(self):
    """Verify that the same payload always produces the same ID."""
    payload = {"name": "Consistent Task", "status": "pending"}
    
    result1 = self.db._normalise_task(payload, "2023-10-27T10:00:00Z")
    result2 = self.db._normalise_task(payload, "2023-10-27T11:00:00Z")
    
    # Same payload should produce same ID regardless of timestamp
    self.assertEqual(result1["task_id"], result2["task_id"])

def test_normalise_task_different_payloads_different_ids(self):
    """Verify that different payloads produce different IDs."""
    payload1 = {"name": "Task A", "status": "pending"}
    payload2 = {"name": "Task B", "status": "pending"}
    
    result1 = self.db._normalise_task(payload1, "2023-10-27T10:00:00Z")
    result2 = self.db._normalise_task(payload2, "2023-10-27T10:00:00Z")
    
    self.assertNotEqual(result1["task_id"], result2["task_id"])

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21cc09b and 1539ba4.

📒 Files selected for processing (2)

task_database.py (2 hunks)
tests/test_task_database.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/test_task_database.py (1)

task_database.py (1)

_normalise_task (101-146)

🪛 Ruff (0.14.1)

task_database.py

108-108: Probable use of insecure hash functions in hashlib: sha1

(S324)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Codacy Static Code Analysis

🔇 Additional comments (3)

task_database.py (2)

11-11: LGTM: Import added correctly.

The hashlib import is properly positioned and necessary for the SHA1-based ID generation.

105-108: Excellent fix for deterministic task IDs.

The SHA1-based implementation correctly addresses the non-deterministic hash() problem. The use of sort_keys=True ensures consistent JSON serialization, making the generated IDs stable across process restarts.

Regarding the static analysis warning about SHA1 being insecure: this is a false positive. SHA1 is appropriate here since you're generating stable identifiers, not performing cryptographic operations like signing or password hashing. The collision resistance of SHA1 is sufficient for this use case.

If you want to future-proof, consider SHA256 (hashlib.sha256) which is more modern, though SHA1 is perfectly acceptable for ID generation.

tests/test_task_database.py (1)

5-8: LGTM: Test setup is clean.

Using an in-memory database for unit tests is the right approach—fast, isolated, and no cleanup required.

gemini-code-assist bot reviewed Oct 28, 2025

View reviewed changes

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(task_database): Use stable SHA1 hash for task ID generation#71

fix(task_database): Use stable SHA1 hash for task ID generation#71
alebmorais wants to merge 1 commit intomainfrom
fix-stable-task-id

alebmorais commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Oct 28, 2025

Uh oh!

coderabbitai bot commented Oct 28, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 28, 2025

Uh oh!

gemini-code-assist bot Oct 28, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	task_id = hashlib.sha1(payload_bytes).hexdigest()
	task_id = hashlib.sha256(payload_bytes).hexdigest()

Conversation

alebmorais commented Oct 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Oct 28, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 28, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alebmorais commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2025 •

edited

Loading