-
Notifications
You must be signed in to change notification settings - Fork 1
Limit episodes by days stored in database by env var #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1. Environment variable filtering in datastore.py:192-208: Modified save_feed_and_episodes() to filter out episodes where userUpdatedDate < (now - OVERCAST_LIMIT_DAYS) during insertion. 2. Cleanup method in datastore.py:461-478: Added cleanup_old_episodes() method that deletes old episodes from the database, but only if there are more than 100 episode rows. 3. CLI integration in cli.py:119: Added call to db.cleanup_old_episodes() at the end of the save command. How it works: - Set OVERCAST_LIMIT_DAYS=30 (or any number) to only keep episodes from the last 30 days - During save, episodes older than the limit won't be inserted - After save completes, if there are >100 episodes total, any remaining old episodes are deleted - If no environment variable is set, the behavior remains unchanged
Reviewer's GuideThis PR adds configurable date-based episode retention by introducing an OVERCAST_LIMIT_DAYS env var: episodes older than the cutoff are skipped on insert and, if total episodes exceed 100, cleaned up after saving. It maintains existing behavior when the variable is unset. Sequence diagram for episode save and cleanup with OVERCAST_LIMIT_DAYSsequenceDiagram
actor User
participant CLI as CLI (save command)
participant DB as Datastore
User->>CLI: Run save command
CLI->>DB: save_feed_and_episodes(feed, episodes)
alt OVERCAST_LIMIT_DAYS is set
DB->>DB: Filter episodes by userUpdatedDate >= cutoff
end
DB->>DB: Upsert feed and filtered episodes
CLI->>DB: cleanup_old_episodes()
alt OVERCAST_LIMIT_DAYS is set and episode count > 100
DB->>DB: Delete episodes older than cutoff
end
DB-->>CLI: Save and cleanup complete
Class diagram for Datastore changes (episode filtering and cleanup)classDiagram
class Datastore {
save_feed_and_episodes(feed, episodes)
cleanup_old_episodes()
...
}
Datastore : +save_feed_and_episodes now filters episodes by OVERCAST_LIMIT_DAYS
Datastore : +cleanup_old_episodes deletes old episodes if count > 100
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
""" WalkthroughA time-based retention policy for podcast episodes was introduced. The Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Datastore
User->>CLI: Run save command
CLI->>Datastore: save_feed_and_episodes(feed, episodes)
Datastore->>Datastore: Filter episodes by USER_UPDATED_DATE (OVERCAST_LIMIT_DAYS)
Datastore->>Datastore: Upsert filtered episodes
CLI->>Datastore: mark_feed_removed_if_missing(ingested_feed_ids)
CLI->>Datastore: cleanup_old_episodes()
Datastore->>Datastore: If count > 100, delete episodes older than cutoff
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @hbmartin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a robust data retention policy for episodes, allowing users to automatically manage the size of their episode database. It achieves this by filtering out old episodes during the initial data ingestion and by periodically cleaning up existing old entries, all configurable via an environment variable. This significantly helps in controlling database growth and keeping only relevant, recent data.
Highlights
- Episode Filtering on Insertion: Implemented a new mechanism in
save_feed_and_episodesto prevent episodes older than a specifiedOVERCAST_LIMIT_DAYS(set via environment variable) from being inserted into the database. This ensures that only recent episodes are initially stored. - Database Cleanup Utility: Introduced a
cleanup_old_episodesmethod that proactively removes episodes from the database that are older than theOVERCAST_LIMIT_DAYSthreshold. This cleanup only occurs if the total number of episodes in the database exceeds 100, preventing unnecessary deletions for small datasets. - CLI Integration for Cleanup: Integrated the new
cleanup_old_episodesmethod into thesavecommand of the CLI. This ensures that the database cleanup process runs automatically after new data is ingested, maintaining the desired data retention policy.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hbmartin - I've reviewed your changes - here's some feedback:
Blocking issues:
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
- Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option. (link)
General comments:
- Add validation (and fallback) around OVERCAST_LIMIT_DAYS parsing to avoid crashing on non-integer or negative values.
- Ensure the date format stored in USER_UPDATED_DATE matches the cutoff_iso string or leverage SQLite’s datetime functions instead of lexicographical string comparison.
- Consider making the hard-coded 100-episode threshold configurable (e.g. via env var) rather than fixed in cleanup_old_episodes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Add validation (and fallback) around OVERCAST_LIMIT_DAYS parsing to avoid crashing on non-integer or negative values.
- Ensure the date format stored in USER_UPDATED_DATE matches the cutoff_iso string or leverage SQLite’s datetime functions instead of lexicographical string comparison.
- Consider making the hard-coded 100-episode threshold configurable (e.g. via env var) rather than fixed in cleanup_old_episodes.
## Individual Comments
### Comment 1
<location> `overcast_to_sqlite/datastore.py:198` </location>
<code_context>
episodes: list[dict],
) -> None:
"""Upsert feed and episodes into database."""
+ limit_days = os.getenv("OVERCAST_LIMIT_DAYS")
+ if limit_days:
+ cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days))
+ episodes = [
+ episode for episode in episodes
+ if episode.get(USER_UPDATED_DATE) and
+ datetime.datetime.fromisoformat(episode[USER_UPDATED_DATE].replace('Z', '+00:00')) >= cutoff_date
+ ]
+
</code_context>
<issue_to_address>
Repeated logic for episode filtering and deletion could be centralized.
Extract the cutoff date and episode age logic into a shared helper to prevent duplication and ensure consistency if changes are needed.
Suggested implementation:
```python
episodes = filter_episodes_by_limit_days(episodes)
```
```python
import datetime
import os
import sqlite3
from collections.abc import Iterable
USER_UPDATED_DATE = "user_updated_date" # Make sure this is defined at module level if not already
def filter_episodes_by_limit_days(episodes: list[dict]) -> list[dict]:
"""Filter episodes by OVERCAST_LIMIT_DAYS environment variable."""
limit_days = os.getenv("OVERCAST_LIMIT_DAYS")
if not limit_days:
return episodes
cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days))
filtered = []
for episode in episodes:
user_updated = episode.get(USER_UPDATED_DATE)
if user_updated:
# Handle possible 'Z' at end of ISO string
dt = datetime.datetime.fromisoformat(user_updated.replace('Z', '+00:00'))
if dt >= cutoff_date:
filtered.append(episode)
return filtered
```
</issue_to_address>
## Security Issues
### Issue 1
<location> `overcast_to_sqlite/datastore.py:467` </location>
<issue_to_address>
**security (opengrep-rules.python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>
### Issue 2
<location> `overcast_to_sqlite/datastore.py:474` </location>
<issue_to_address>
**security (opengrep-rules.python.sqlalchemy.security.sqlalchemy-execute-raw-query):** Avoiding SQL string concatenation: untrusted input concatenated with raw SQL query can result in SQL Injection. In order to execute raw query safely, prepared statement should be used. SQLAlchemy provides TextualSQL to easily used prepared statement with named parameters. For complex SQL composition, use SQL Expression Language or Schema Definition Language. In most cases, SQLAlchemy ORM will be a better option.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
overcast_to_sqlite/datastore.py
Outdated
| limit_days = os.getenv("OVERCAST_LIMIT_DAYS") | ||
| if limit_days: | ||
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days)) | ||
| episodes = [ | ||
| episode for episode in episodes | ||
| if episode.get(USER_UPDATED_DATE) and | ||
| datetime.datetime.fromisoformat(episode[USER_UPDATED_DATE].replace('Z', '+00:00')) >= cutoff_date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Repeated logic for episode filtering and deletion could be centralized.
Extract the cutoff date and episode age logic into a shared helper to prevent duplication and ensure consistency if changes are needed.
Suggested implementation:
episodes = filter_episodes_by_limit_days(episodes)import datetime
import os
import sqlite3
from collections.abc import Iterable
USER_UPDATED_DATE = "user_updated_date" # Make sure this is defined at module level if not already
def filter_episodes_by_limit_days(episodes: list[dict]) -> list[dict]:
"""Filter episodes by OVERCAST_LIMIT_DAYS environment variable."""
limit_days = os.getenv("OVERCAST_LIMIT_DAYS")
if not limit_days:
return episodes
cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days))
filtered = []
for episode in episodes:
user_updated = episode.get(USER_UPDATED_DATE)
if user_updated:
# Handle possible 'Z' at end of ISO string
dt = datetime.datetime.fromisoformat(user_updated.replace('Z', '+00:00'))
if dt >= cutoff_date:
filtered.append(episode)
return filtered
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 9f89be4 in 1 minute and 35 seconds. Click for details.
- Reviewed
63lines of code in2files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. overcast_to_sqlite/cli.py:119
- Draft comment:
Consider checking if OVERCAST_LIMIT_DAYS is set (or logging cleanup action) before calling cleanup_old_episodes to avoid unnecessary DB operations. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
2. overcast_to_sqlite/datastore.py:202
- Draft comment:
Handle potential parsing errors in datetime.fromisoformat for USER_UPDATED_DATE (e.g. via try/except) to avoid crashes on bad date formats. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The code is already defensive by checking if USER_UPDATED_DATE exists before parsing. The date format appears to be standardized ISO format with Z timezone marker that gets converted to +00:00. fromisoformat() should reliably handle this format. Adding try/except would make the code more complex without clear benefit since the format is standardized. I could be wrong about how standardized the date format is - there could be edge cases I'm not considering. The code could crash if the date string is malformed. While crashes are possible, they would indicate data corruption or API changes. It's better to fail fast in those cases rather than silently skip records. The existing .get() check handles the common case of missing dates. The comment should be deleted. The existing .get() check provides sufficient protection, and adding try/except would add complexity without clear benefit.
3. overcast_to_sqlite/datastore.py:200
- Draft comment:
Prefer using datetime.timezone.utc instead of datetime.UTC for clarity and standard compliance. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
Workflow ID: wflow_BeFAazh78wR1Qgwn
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
overcast_to_sqlite/datastore.py
Outdated
| """Upsert feed and episodes into database.""" | ||
| limit_days = os.getenv("OVERCAST_LIMIT_DAYS") | ||
| if limit_days: | ||
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate OVERCAST_LIMIT_DAYS before converting to int—wrap int(limit_days) in a try/except to handle non-numeric values.
| for result in results | ||
| ] | ||
|
|
||
| def cleanup_old_episodes(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cleanup_old_episodes, consider validating OVERCAST_LIMIT_DAYS and document the rationale for only running cleanup when episode count exceeds 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The changes introduce episode retention management via an environment variable. The implementation lacks input validation for the environment variable, which could lead to application crashes if an invalid value is provided. Suggestions were made to add validation and improve maintainability by removing a magic number.
overcast_to_sqlite/datastore.py
Outdated
| limit_days = os.getenv("OVERCAST_LIMIT_DAYS") | ||
| if limit_days: | ||
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days)) | ||
| episodes = [ | ||
| episode for episode in episodes | ||
| if episode.get(USER_UPDATED_DATE) and | ||
| datetime.datetime.fromisoformat(episode[USER_UPDATED_DATE].replace('Z', '+00:00')) >= cutoff_date | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is vulnerable to a ValueError if the OVERCAST_LIMIT_DAYS environment variable is a non-numeric string. The int(limit_days) call on line 200 will crash the application. Validate the input before casting to an integer.
| limit_days = os.getenv("OVERCAST_LIMIT_DAYS") | |
| if limit_days: | |
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days)) | |
| episodes = [ | |
| episode for episode in episodes | |
| if episode.get(USER_UPDATED_DATE) and | |
| datetime.datetime.fromisoformat(episode[USER_UPDATED_DATE].replace('Z', '+00:00')) >= cutoff_date | |
| ] | |
| limit_days_str = os.getenv("OVERCAST_LIMIT_DAYS") | |
| if limit_days_str and limit_days_str.isdigit(): | |
| limit_days = int(limit_days_str) | |
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=limit_days) | |
| episodes = [ | |
| episode for episode in episodes | |
| if episode.get(USER_UPDATED_DATE) and | |
| datetime.datetime.fromisoformat(episode[USER_UPDATED_DATE].replace('Z', '+00:00')) >= cutoff_date | |
| ] |
overcast_to_sqlite/datastore.py
Outdated
| limit_days = os.getenv("OVERCAST_LIMIT_DAYS") | ||
| if not limit_days: | ||
| return | ||
|
|
||
| episode_count = self.db.execute(f"SELECT COUNT(*) FROM {EPISODES}").fetchone()[0] | ||
| if episode_count <= 100: | ||
| return | ||
|
|
||
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days)) | ||
| cutoff_iso = cutoff_date.isoformat() | ||
|
|
||
| self.db.execute( | ||
| f"DELETE FROM {EPISODES} WHERE {USER_UPDATED_DATE} < ?", | ||
| [cutoff_iso] | ||
| ) | ||
| self.db.conn.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int(limit_days) call on line 471 is unsafe. If OVERCAST_LIMIT_DAYS is set to a non-numeric value, the program will crash. Input from environment variables should always be validated. Also, the value 100 on line 468 is a magic number. Define it as a named constant for readability.
| limit_days = os.getenv("OVERCAST_LIMIT_DAYS") | |
| if not limit_days: | |
| return | |
| episode_count = self.db.execute(f"SELECT COUNT(*) FROM {EPISODES}").fetchone()[0] | |
| if episode_count <= 100: | |
| return | |
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days)) | |
| cutoff_iso = cutoff_date.isoformat() | |
| self.db.execute( | |
| f"DELETE FROM {EPISODES} WHERE {USER_UPDATED_DATE} < ?", | |
| [cutoff_iso] | |
| ) | |
| self.db.conn.commit() | |
| limit_days_str = os.getenv("OVERCAST_LIMIT_DAYS") | |
| if not limit_days_str or not limit_days_str.isdigit(): | |
| return | |
| EPISODE_CLEANUP_THRESHOLD = 100 | |
| episode_count = self.db.execute(f"SELECT COUNT(*) FROM {EPISODES}").fetchone()[0] | |
| if episode_count <= EPISODE_CLEANUP_THRESHOLD: | |
| return | |
| limit_days = int(limit_days_str) | |
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=limit_days) | |
| cutoff_iso = cutoff_date.isoformat() | |
| self.db.execute( | |
| f"DELETE FROM {EPISODES} WHERE {USER_UPDATED_DATE} < ?", | |
| [cutoff_iso] | |
| ) | |
| self.db.conn.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
overcast_to_sqlite/cli.py(1 hunks)overcast_to_sqlite/datastore.py(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
overcast_to_sqlite/cli.py (1)
overcast_to_sqlite/datastore.py (1)
cleanup_old_episodes(461-478)
🪛 Ruff (0.11.9)
overcast_to_sqlite/datastore.py
200-200: Line too long (107 > 88)
(E501)
203-203: Trailing whitespace
Remove trailing whitespace
(W291)
204-204: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
204-204: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
204-204: Line too long (113 > 88)
(E501)
206-206: Blank line contains whitespace
Remove whitespace from blank line
(W293)
462-462: Line too long (93 > 88)
(E501)
466-466: Blank line contains whitespace
Remove whitespace from blank line
(W293)
467-467: Line too long (89 > 88)
(E501)
468-468: Magic value used in comparison, consider replacing 100 with a constant variable
(PLR2004)
470-470: Blank line contains whitespace
Remove whitespace from blank line
(W293)
471-471: Line too long (103 > 88)
(E501)
473-473: Blank line contains whitespace
Remove whitespace from blank line
(W293)
476-476: Trailing comma missing
Add trailing comma
(COM812)
🪛 GitHub Actions: Lint
overcast_to_sqlite/datastore.py
[error] 200-200: Ruff E501: Line too long (107 > 88).
[error] 203-203: Ruff W291: Trailing whitespace detected.
[error] 204-204: Ruff FURB162: Unnecessary timezone replacement with zero offset. Remove .replace() call.
[error] 204-204: Ruff Q000: Single quotes found but double quotes preferred (two occurrences).
[error] 204-204: Ruff E501: Line too long (113 > 88).
[error] 206-206: Ruff W293: Blank line contains whitespace.
[error] 462-462: Ruff E501: Line too long (93 > 88).
[error] 466-466: Ruff W293: Blank line contains whitespace.
[error] 467-467: Ruff E501: Line too long (89 > 88).
[error] 468-468: Ruff PLR2004: Magic value used in comparison, consider replacing 100 with a constant variable.
[error] 470-470: Ruff W293: Blank line contains whitespace.
[error] 471-471: Ruff E501: Line too long (103 > 88).
[error] 473-473: Ruff W293: Blank line contains whitespace.
[error] 476-476: Ruff COM812: Trailing comma missing.
🔇 Additional comments (3)
overcast_to_sqlite/datastore.py (2)
4-4: LGTM!The
osimport is correctly added to support the new environment variable functionality.
198-206: Fix timezone handling and add error handling for date parsing.The filtering logic has several issues that need to be addressed:
- The
.replace('Z', '+00:00')is unnecessary in Python 3.11+ (FURB162)- Missing error handling for invalid date formats or missing
USER_UPDATED_DATE- No validation that
limit_daysis a valid integerApply this diff to improve robustness:
limit_days = os.getenv("OVERCAST_LIMIT_DAYS") if limit_days: - cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days)) + try: + cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta( + days=int(limit_days) + ) + except ValueError: + return # Invalid limit_days value, skip filtering + episodes = [ episode for episode in episodes - if episode.get(USER_UPDATED_DATE) and - datetime.datetime.fromisoformat(episode[USER_UPDATED_DATE].replace('Z', '+00:00')) >= cutoff_date + if episode.get(USER_UPDATED_DATE) and + _parse_episode_date(episode[USER_UPDATED_DATE]) >= cutoff_date ] + +def _parse_episode_date(date_str: str) -> datetime.datetime | None: + """Parse episode date string, handling various formats safely.""" + try: + return datetime.datetime.fromisoformat(date_str.replace("Z", "+00:00")) + except (ValueError, AttributeError): + return datetime.datetime.min # Fallback for invalid datesLikely an incorrect or invalid review comment.
overcast_to_sqlite/cli.py (1)
119-119: LGTM! Clean integration of cleanup functionality.The placement of
db.cleanup_old_episodes()afterdb.mark_feed_removed_if_missing()is logical and ensures the cleanup happens at the appropriate time in the save workflow.
overcast_to_sqlite/datastore.py
Outdated
| limit_days = os.getenv("OVERCAST_LIMIT_DAYS") | ||
| if limit_days: | ||
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days)) | ||
| episodes = [ | ||
| episode for episode in episodes | ||
| if episode.get(USER_UPDATED_DATE) and | ||
| datetime.datetime.fromisoformat(episode[USER_UPDATED_DATE].replace('Z', '+00:00')) >= cutoff_date | ||
| ] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix formatting issues to resolve pipeline failures.
The linting pipeline is failing due to multiple formatting issues that need to be addressed.
Apply this comprehensive formatting fix:
def save_feed_and_episodes(
self,
feed: dict,
episodes: list[dict],
) -> None:
"""Upsert feed and episodes into database."""
limit_days = os.getenv("OVERCAST_LIMIT_DAYS")
if limit_days:
- cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days))
+ cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(
+ days=int(limit_days)
+ )
episodes = [
episode for episode in episodes
- if episode.get(USER_UPDATED_DATE) and
- datetime.datetime.fromisoformat(episode[USER_UPDATED_DATE].replace('Z', '+00:00')) >= cutoff_date
+ if episode.get(USER_UPDATED_DATE) and
+ datetime.datetime.fromisoformat(episode[USER_UPDATED_DATE]) >= cutoff_date
]
-
+
self.db[FEEDS].upsert(feed, pk=OVERCAST_ID)
self.db[EPISODES].upsert_all(episodes, pk=OVERCAST_ID)
def cleanup_old_episodes(self) -> None:
- """Delete episodes older than OVERCAST_LIMIT_DAYS if more than 100 episodes exist."""
+ """Delete old episodes if more than 100 exist."""
limit_days = os.getenv("OVERCAST_LIMIT_DAYS")
if not limit_days:
return
-
- episode_count = self.db.execute(f"SELECT COUNT(*) FROM {EPISODES}").fetchone()[0]
+
+ episode_count = self.db.execute(
+ f"SELECT COUNT(*) FROM {EPISODES}"
+ ).fetchone()[0]
if episode_count <= 100:
return
-
- cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(days=int(limit_days))
+
+ cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(
+ days=int(limit_days)
+ )
cutoff_iso = cutoff_date.isoformat()
-
+
self.db.execute(
f"DELETE FROM {EPISODES} WHERE {USER_UPDATED_DATE} < ?",
- [cutoff_iso]
+ [cutoff_iso],
)
self.db.conn.commit()Also applies to: 461-478
🧰 Tools
🪛 Ruff (0.11.9)
200-200: Line too long (107 > 88)
(E501)
203-203: Trailing whitespace
Remove trailing whitespace
(W291)
204-204: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
204-204: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
204-204: Line too long (113 > 88)
(E501)
206-206: Blank line contains whitespace
Remove whitespace from blank line
(W293)
🪛 GitHub Actions: Lint
[error] 200-200: Ruff E501: Line too long (107 > 88).
[error] 203-203: Ruff W291: Trailing whitespace detected.
[error] 204-204: Ruff FURB162: Unnecessary timezone replacement with zero offset. Remove .replace() call.
[error] 204-204: Ruff Q000: Single quotes found but double quotes preferred (two occurrences).
[error] 204-204: Ruff E501: Line too long (113 > 88).
[error] 206-206: Ruff W293: Blank line contains whitespace.
🤖 Prompt for AI Agents
In overcast_to_sqlite/datastore.py around lines 198 to 206 and also lines 461 to
478, there are multiple formatting issues causing pipeline failures. Review the
code for consistent indentation, proper spacing around operators, and line
breaks according to the project's style guide. Fix these formatting problems by
aligning the code blocks correctly, ensuring spaces after commas and operators,
and maintaining consistent line breaks to pass linting checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed f768106 in 1 minute and 22 seconds. Click for details.
- Reviewed
99lines of code in1files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_mXb4WVT9RnCUojU8
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
| """Upsert feed and episodes into database.""" | ||
| limit_days = None | ||
| try: | ||
| if env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect usage of walrus operator. Use parentheses to assign the env var value, e.g.:
if (env_limit := os.getenv("OVERCAST_LIMIT_DAYS")) is not None:
| if env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not None: | |
| if (env_limit := os.getenv("OVERCAST_LIMIT_DAYS")) is not None: |
| """ | ||
| limit_days = None | ||
| try: | ||
| if env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect use of walrus operator here as well. Wrap the assignment in parentheses:
if (env_limit := os.getenv("OVERCAST_LIMIT_DAYS")) is not None:
| if env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not None: | |
| if (env_limit := os.getenv("OVERCAST_LIMIT_DAYS")) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
overcast_to_sqlite/datastore.py(5 hunks)
🧰 Additional context used
🪛 GitHub Actions: Lint
overcast_to_sqlite/datastore.py
[error] 216-216: Ruff: Unnecessary timezone replacement with zero offset (FURB162). Remove .replace() call.
🔇 Additional comments (3)
overcast_to_sqlite/datastore.py (3)
4-4: LGTM: Import statement correctly added.The
osmodule import is appropriately added to support the new environment variable functionality.
44-45: LGTM: Magic number replaced with named constant.Good practice to replace the hardcoded
100with a named constant for better maintainability and consistency across the codebase.
461-461: LGTM: Hardcoded value replaced with constant.Good use of the
_DEFAULT_EPISODE_LIMITconstant instead of the magic number.
| limit_days = None | ||
| try: | ||
| if env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not None: | ||
| limit_days = int(env_limit) | ||
| except ValueError: | ||
| pass | ||
|
|
||
| if limit_days: | ||
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta( | ||
| days=int(limit_days), | ||
| ) | ||
| episodes = [ | ||
| episode | ||
| for episode in episodes | ||
| if episode.get(USER_UPDATED_DATE) | ||
| and datetime.datetime.fromisoformat( | ||
| episode[USER_UPDATED_DATE].replace("Z", "+00:00"), | ||
| ) | ||
| >= cutoff_date | ||
| ] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix critical logic error with walrus operator.
The walrus operator assignment on line 202 has a critical logic error that will prevent the episode filtering from working correctly.
Apply this diff to fix the logic error and address the pipeline failure:
- limit_days = None
- try:
- if env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not None:
- limit_days = int(env_limit)
- except ValueError:
- pass
+ limit_days = None
+ if env_limit := os.getenv("OVERCAST_LIMIT_DAYS"):
+ try:
+ limit_days = int(env_limit)
+ except ValueError:
+ pass
if limit_days:
cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(
- days=int(limit_days),
+ days=limit_days,
)
episodes = [
episode
for episode in episodes
if episode.get(USER_UPDATED_DATE)
and datetime.datetime.fromisoformat(
- episode[USER_UPDATED_DATE].replace("Z", "+00:00"),
+ episode[USER_UPDATED_DATE],
)
>= cutoff_date
]Issues fixed:
- Critical logic error:
env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not NoneassignsTrue/Falsetoenv_limit, not the actual value - Pipeline failure: Removed unnecessary
.replace("Z", "+00:00")call (FURB162) - Redundant conversion: Removed
int(limit_days)sincelimit_daysis already an integer
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| limit_days = None | |
| try: | |
| if env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not None: | |
| limit_days = int(env_limit) | |
| except ValueError: | |
| pass | |
| if limit_days: | |
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta( | |
| days=int(limit_days), | |
| ) | |
| episodes = [ | |
| episode | |
| for episode in episodes | |
| if episode.get(USER_UPDATED_DATE) | |
| and datetime.datetime.fromisoformat( | |
| episode[USER_UPDATED_DATE].replace("Z", "+00:00"), | |
| ) | |
| >= cutoff_date | |
| ] | |
| limit_days = None | |
| if env_limit := os.getenv("OVERCAST_LIMIT_DAYS"): | |
| try: | |
| limit_days = int(env_limit) | |
| except ValueError: | |
| pass | |
| if limit_days: | |
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta( | |
| days=limit_days, | |
| ) | |
| episodes = [ | |
| episode | |
| for episode in episodes | |
| if episode.get(USER_UPDATED_DATE) | |
| and datetime.datetime.fromisoformat( | |
| episode[USER_UPDATED_DATE], | |
| ) | |
| >= cutoff_date | |
| ] |
🧰 Tools
🪛 GitHub Actions: Lint
[error] 216-216: Ruff: Unnecessary timezone replacement with zero offset (FURB162). Remove .replace() call.
🤖 Prompt for AI Agents
In overcast_to_sqlite/datastore.py around lines 200 to 220, fix the walrus
operator usage by assigning the environment variable value directly to env_limit
instead of a boolean check; then convert env_limit to int for limit_days. Remove
the unnecessary .replace("Z", "+00:00") call when parsing the ISO date string,
and avoid redundant int conversions by using limit_days directly as an integer
in the timedelta calculation.
| def cleanup_old_episodes(self) -> None: | ||
| """Delete episodes older than OVERCAST_LIMIT_DAYS. | ||
| Only deletes if more than 100 episodes exist. | ||
| """ | ||
| limit_days = None | ||
| try: | ||
| if env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not None: | ||
| limit_days = int(env_limit) | ||
| except ValueError: | ||
| pass | ||
|
|
||
| if not limit_days: | ||
| return | ||
|
|
||
| episode_count = self.db.execute(f"SELECT COUNT(*) FROM {EPISODES}").fetchone()[ | ||
| 0 | ||
| ] | ||
| if episode_count <= _DEFAULT_EPISODE_LIMIT: | ||
| return | ||
|
|
||
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta( | ||
| days=int(limit_days), | ||
| ) | ||
| cutoff_iso = cutoff_date.isoformat() | ||
|
|
||
| self.db.execute( | ||
| f"DELETE FROM {EPISODES} WHERE {USER_UPDATED_DATE} < ?", | ||
| [cutoff_iso], | ||
| ) | ||
| self.db.conn.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the same logic error in cleanup method.
The cleanup method has the same walrus operator logic error as in the save_feed_and_episodes method.
Apply this diff to fix the logic error:
"""Delete episodes older than OVERCAST_LIMIT_DAYS.
Only deletes if more than 100 episodes exist.
"""
- limit_days = None
- try:
- if env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not None:
- limit_days = int(env_limit)
- except ValueError:
- pass
+ limit_days = None
+ if env_limit := os.getenv("OVERCAST_LIMIT_DAYS"):
+ try:
+ limit_days = int(env_limit)
+ except ValueError:
+ pass
if not limit_days:
return
episode_count = self.db.execute(f"SELECT COUNT(*) FROM {EPISODES}").fetchone()[
0
]
if episode_count <= _DEFAULT_EPISODE_LIMIT:
return
cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta(
- days=int(limit_days),
+ days=limit_days,
)
cutoff_iso = cutoff_date.isoformat()
self.db.execute(
f"DELETE FROM {EPISODES} WHERE {USER_UPDATED_DATE} < ?",
[cutoff_iso],
)
self.db.conn.commit()Issues fixed:
- Critical logic error: Same walrus operator issue as in the other method
- Redundant conversion: Removed
int(limit_days)sincelimit_daysis already an integer
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def cleanup_old_episodes(self) -> None: | |
| """Delete episodes older than OVERCAST_LIMIT_DAYS. | |
| Only deletes if more than 100 episodes exist. | |
| """ | |
| limit_days = None | |
| try: | |
| if env_limit := os.getenv("OVERCAST_LIMIT_DAYS") is not None: | |
| limit_days = int(env_limit) | |
| except ValueError: | |
| pass | |
| if not limit_days: | |
| return | |
| episode_count = self.db.execute(f"SELECT COUNT(*) FROM {EPISODES}").fetchone()[ | |
| 0 | |
| ] | |
| if episode_count <= _DEFAULT_EPISODE_LIMIT: | |
| return | |
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta( | |
| days=int(limit_days), | |
| ) | |
| cutoff_iso = cutoff_date.isoformat() | |
| self.db.execute( | |
| f"DELETE FROM {EPISODES} WHERE {USER_UPDATED_DATE} < ?", | |
| [cutoff_iso], | |
| ) | |
| self.db.conn.commit() | |
| def cleanup_old_episodes(self) -> None: | |
| """Delete episodes older than OVERCAST_LIMIT_DAYS. | |
| Only deletes if more than 100 episodes exist. | |
| """ | |
| limit_days = None | |
| if env_limit := os.getenv("OVERCAST_LIMIT_DAYS"): | |
| try: | |
| limit_days = int(env_limit) | |
| except ValueError: | |
| pass | |
| if not limit_days: | |
| return | |
| episode_count = self.db.execute(f"SELECT COUNT(*) FROM {EPISODES}").fetchone()[ | |
| 0 | |
| ] | |
| if episode_count <= _DEFAULT_EPISODE_LIMIT: | |
| return | |
| cutoff_date = datetime.datetime.now(tz=datetime.UTC) - datetime.timedelta( | |
| days=limit_days, | |
| ) | |
| cutoff_iso = cutoff_date.isoformat() | |
| self.db.execute( | |
| f"DELETE FROM {EPISODES} WHERE {USER_UPDATED_DATE} < ?", | |
| [cutoff_iso], | |
| ) | |
| self.db.conn.commit() |
🤖 Prompt for AI Agents
In overcast_to_sqlite/datastore.py around lines 475 to 505, the
cleanup_old_episodes method incorrectly uses the walrus operator with os.getenv,
causing a logic error when reading OVERCAST_LIMIT_DAYS. Fix this by first
retrieving the environment variable into a temporary variable, then checking if
it is not None before converting it to int and assigning to limit_days. Also,
remove the redundant int() conversion of limit_days when calculating cutoff_date
since limit_days is already an integer.
User description
How it works:
PR Type
Enhancement
Description
Add environment variable
OVERCAST_LIMIT_DAYSto control episode retentionFilter episodes during save to exclude old episodes
Implement cleanup method to delete old episodes from database
Integrate cleanup into CLI save command workflow
Changes diagram
Changes walkthrough 📝
cli.py
Integrate episode cleanup into save commandovercast_to_sqlite/cli.py
db.cleanup_old_episodes()after feed processingdatastore.py
Implement episode filtering and cleanup functionalityovercast_to_sqlite/datastore.py
osimport for environment variable accessOVERCAST_LIMIT_DAYSinsave_feed_and_episodes()cleanup_old_episodes()method to delete old episodesSummary by CodeRabbit
OVERCAST_LIMIT_DAYSenvironment variable) are now removed if the total episode count exceeds 100.Important
Add environment variable
OVERCAST_LIMIT_DAYSto limit stored episodes by age, with filtering and cleanup indatastore.pyand CLI integration incli.py.OVERCAST_LIMIT_DAYSto control episode retention.datastore.py,save_feed_and_episodes()filters episodes older than the limit during insertion.cleanup_old_episodes()method deletes episodes older than the limit if more than 100 episodes exist.cli.py,db.cleanup_old_episodes()is called at the end of thesavecommand.save_feed_and_episodes()indatastore.pyfilters episodes byOVERCAST_LIMIT_DAYS.cleanup_old_episodes()indatastore.pydeletes old episodes if total count exceeds 100.save()incli.pynow includes a call tocleanup_old_episodes()after saving episodes.This description was created by
for f768106. You can customize this summary. It will automatically update as commits are pushed.