-
Notifications
You must be signed in to change notification settings - Fork 30
feat: attempt to rate limit on code that is within json body #821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/rate_limit_status_code_in_json_body#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/rate_limit_status_code_in_json_bodyHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
|
/autofix
|
📝 WalkthroughWalkthroughThis PR introduces path-based rate-limit status code detection to HTTPAPIBudget. A new Changes
Sequence Diagram(s)sequenceDiagram
participant Config as Configuration
participant Factory as Factory
participant Budget as HttpAPIBudget
participant Response as HTTP Response
Config->>Factory: HTTPAPIBudgetModel with path_for_status_code
Factory->>Budget: create_http_api_budget() passes path_for_status_code
Budget->>Budget: __init__ stores _path_for_status_code
Response->>Budget: get_calls_left_from_response(response)
alt path_for_status_code provided
Budget->>Response: Extract nested status code from response.json()
Response-->>Budget: status_code value
Budget->>Budget: Compare against status_codes_for_ratelimit_hit
alt Match found
Budget-->>Response: return 0 (rate limit hit)
else No match
Budget->>Budget: Fall back to HTTP status code check
Budget-->>Response: return calls_left or None
end
else No path provided
Budget->>Budget: Use standard HTTP status code
Budget-->>Response: return calls_left or None
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte_cdk/sources/streams/call_rate.py (1)
638-652: Avoid mutable default arg for status codes and update docstring.Using a mutable default list (
[429]) risks shared-state bugs; can we switch toNoneand materialize an immutable/frozenset inside? Also, the docstring doesn’t mentionpath_for_status_code. Proposed minimal diff:-def __init__( - self, - ratelimit_reset_header: str = "ratelimit-reset", - ratelimit_remaining_header: str = "ratelimit-remaining", - status_codes_for_ratelimit_hit: List[int] = [429], - path_for_status_code: Optional[List[str]] = None, - **kwargs: Any, -): +def __init__( + self, + ratelimit_reset_header: str = "ratelimit-reset", + ratelimit_remaining_header: str = "ratelimit-remaining", + status_codes_for_ratelimit_hit: Optional[List[int]] = None, + path_for_status_code: Optional[List[str]] = None, + **kwargs: Any, +): @@ - :param status_codes_for_ratelimit_hit: list of HTTP status codes that signal about rate limit being hit + :param status_codes_for_ratelimit_hit: list of HTTP status codes that signal about rate limit being hit + :param path_for_status_code: JSON body path to a non-HTTP “status code” to also treat as rate-limited when matched @@ - self._status_codes_for_ratelimit_hit = status_codes_for_ratelimit_hit + # use an immutable set for safe, fast membership + self._status_codes_for_ratelimit_hit = frozenset(status_codes_for_ratelimit_hit or [429])This keeps behavior the same but removes the mutable default risk and documents the new parameter. Wdyt?
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1672-1677: Consider adding examples and clarifying the description.The new
path_for_status_codefield is structurally sound, but could benefit from improved documentation:
The description "When the status code is not a HTTP status code but a code in the HTTP response" is a bit circular. How about rephrasing to something like: "Path to the status code field within the HTTP response body (for nested status codes)"?
Similar path-based fields in the schema (e.g.,
session_token_path,schema_pointer) include examples. Would adding examples like["error", "status"]or["response", "code"]help users understand the expected format?Should this field support interpolation_context (e.g., to reference config values), like other optional fields in the schema?
What do you think—would clarifying the description and adding examples improve usability? wdyt?
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
2216-2220: Clarify path semantics and align naming with existing “*_path” fields?
- Should this follow the repo’s existing naming pattern like
status_code_path(cf.session_token_path), for consistency, wdyt?- The current type
List[str]doesn’t cover array indices or wildcards; do we want to mirror the Dpath semantics (allow “*” or numeric indices as strings) and document it to avoid surprises, wdyt?- Minor: “a HTTP” → “an HTTP”.
Also, since this file is generated, can you confirm this change originates from the YAML (
declarative_component_schema.yaml) so regeneration remains deterministic? Based on learnings.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(1 hunks)airbyte_cdk/sources/streams/call_rate.py(4 hunks)pyproject.toml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
4143-4157: LGTM! Field propagation is clean and consistent.The addition of
path_for_status_codeto theHttpAPIBudgetconstructor follows the same pattern as other optional fields in this function. The implementation correctly passes the field from the model without transformation, which is appropriate for this factory method.One optional consideration: Should there be validation that
path_for_status_code, if provided, contains valid path segments (e.g., non-empty list, valid dpath syntax)? Currently, validation is deferred to theHttpAPIBudgetclass, which is fine if that's where the validation logic lives, wdyt?#!/bin/bash # Verify that HttpAPIBudget handles path_for_status_code correctly and performs appropriate validation echo "=== Searching for HttpAPIBudget class definition and validation ===" ast-grep --pattern $'class HttpAPIBudget { $$$ __init__($$$, path_for_status_code$$$) { $$$ } $$$ }' echo -e "\n=== Checking for validation of path_for_status_code ===" rg -A10 -B2 'path_for_status_code' --type py -g 'call_rate.py'pyproject.toml (1)
121-121: dagger-io 0.19.3 is safe—but is the update necessary for this feature?Good news: dagger-io 0.19.3 exists and has no known security vulnerabilities. However, the main question remains: is this dependency update required for the path-based rate-limit feature? If not, would you consider separating it into a dedicated PR to keep changes focused and easier to review? If it is needed, could you clarify the dependency in the PR description?
airbyte_cdk/sources/streams/call_rate.py (2)
8-8: Import looks fine.New
functoolsimport is appropriate for the reduce usage. Nothing to change.
14-14: Typing import addition is fine.Adding
Listis consistent with the new annotations.
| if self._path_for_status_code: | ||
| try: | ||
| if ( | ||
| functools.reduce(lambda a, b: a[b], self._path_for_status_code, response.json()) | ||
| in self._status_codes_for_ratelimit_hit | ||
| ): | ||
| return 0 | ||
| except KeyError: | ||
| # the status is not present in the response so we will assume that we're not being rate limited | ||
| pass | ||
| elif response.status_code in self._status_codes_for_ratelimit_hit: | ||
| return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t skip HTTP-status fallback when a JSON path is configured; harden JSON/path handling.
With elif, a configured path_for_status_code suppresses the HTTP status check; a missing/unevaluable path could cause us to ignore a 429. Also, only KeyError is handled; JSON decoding or type/index errors may surface. Proposed minimal fix:
- if self._path_for_status_code:
- try:
- if (
- functools.reduce(lambda a, b: a[b], self._path_for_status_code, response.json())
- in self._status_codes_for_ratelimit_hit
- ):
- return 0
- except KeyError:
- # the status is not present in the response so we will assume that we're not being rate limited
- pass
- elif response.status_code in self._status_codes_for_ratelimit_hit:
+ if self._path_for_status_code:
+ try:
+ body = response.json()
+ value = functools.reduce(lambda a, b: a[b], self._path_for_status_code, body)
+ # normalize common stringified codes, e.g. "429"
+ if isinstance(value, str) and value.isdigit():
+ value = int(value)
+ if value in self._status_codes_for_ratelimit_hit:
+ return 0
+ except (KeyError, TypeError, ValueError, json.JSONDecodeError):
+ # path missing or body not JSON; fall back to HTTP status check below
+ pass
+ if response.status_code in self._status_codes_for_ratelimit_hit:
return 0And add the import near the top:
+import jsonOptionally, do you want to support numeric list indices in the path (e.g., ["errors", "0", "code"]) by treating numeric tokens as indexes? Easy to add if helpful. Wdyt?
🤖 Prompt for AI Agents
In airbyte_cdk/sources/streams/call_rate.py around lines 677-688, the current
code uses an elif so when a path_for_status_code is configured the plain HTTP
status_code check is skipped, and it only catches KeyError which ignores JSON
decode/type/index problems; change the logic so the HTTP-status check is always
evaluated (use a separate if for the JSON-path branch, not elif), broaden the
exception handling around path evaluation to catch JSONDecodeError, TypeError,
IndexError (import JSONDecodeError from json at the top), and on any exception
fall back to evaluating response.status_code against
_status_codes_for_ratelimit_hit; optionally, if you want numeric path tokens to
select list indices, convert tokens that are numeric strings to ints before
indexing.
PyTest Results (Full)3 820 tests 3 808 ✅ 11m 1s ⏱️ Results for commit 03352a3. |
See https://airbytehq-team.slack.com/archives/C09P8G83G93
Summary by CodeRabbit
New Features
Chores