Skip to content

Conversation

@maxi297
Copy link
Contributor

@maxi297 maxi297 commented Oct 29, 2025

See https://airbytehq-team.slack.com/archives/C09P8G83G93

Summary by CodeRabbit

  • New Features

    • Added support for extracting HTTP status codes from custom paths within response bodies, enabling enhanced rate-limit detection when status codes are embedded in JSON responses rather than standard HTTP headers.
  • Chores

    • Updated development dependencies.

@github-actions github-actions bot added the enhancement New feature or request label Oct 29, 2025
@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/rate_limit_status_code_in_json_body#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/rate_limit_status_code_in_json_body

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@maxi297
Copy link
Contributor Author

maxi297 commented Oct 29, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 29, 2025

📝 Walkthrough

Walkthrough

This PR introduces path-based rate-limit status code detection to HTTPAPIBudget. A new path_for_status_code field enables locating status codes nested within HTTP response bodies, complementing standard HTTP status codes. The field is added to the schema, model, factory, and call rate implementation layers. A minor dependency update to dagger-io (0.19.0 → 0.19.3) is also included.

Changes

Cohort / File(s) Summary
Schema and Model Definitions
airbyte_cdk/sources/declarative/declarative_component_schema.yaml, airbyte_cdk/sources/declarative/models/declarative_component_schema.py
Added optional path_for_status_code: Optional[List[str]] field to HTTPAPIBudget definition and model, with title and description indicating it locates non-standard status codes within HTTP response bodies.
Factory Integration
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Propagates path_for_status_code from HTTPAPIBudgetModel to HttpAPIBudget instantiation in create_http_api_budget method.
Core Implementation
airbyte_cdk/sources/streams/call_rate.py
Updated HttpAPIBudget.__init__ to accept path_for_status_code: Optional[List[str]] parameter; introduced path-based extraction logic in get_calls_left_from_response to look up nested status codes within response JSON and compare against status_codes_for_ratelimit_hit with fallback to standard checks; refined type hint for status_codes_for_ratelimit_hit to List[int]; added functools import.
Dependencies
pyproject.toml
Bumped dagger-io dev dependency from 0.19.0 to 0.19.3.

Sequence Diagram(s)

sequenceDiagram
    participant Config as Configuration
    participant Factory as Factory
    participant Budget as HttpAPIBudget
    participant Response as HTTP Response

    Config->>Factory: HTTPAPIBudgetModel with path_for_status_code
    Factory->>Budget: create_http_api_budget() passes path_for_status_code
    Budget->>Budget: __init__ stores _path_for_status_code

    Response->>Budget: get_calls_left_from_response(response)
    alt path_for_status_code provided
        Budget->>Response: Extract nested status code from response.json()
        Response-->>Budget: status_code value
        Budget->>Budget: Compare against status_codes_for_ratelimit_hit
        alt Match found
            Budget-->>Response: return 0 (rate limit hit)
        else No match
            Budget->>Budget: Fall back to HTTP status code check
            Budget-->>Response: return calls_left or None
        end
    else No path provided
        Budget->>Budget: Use standard HTTP status code
        Budget-->>Response: return calls_left or None
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • call_rate.py: The path extraction and fallback logic warrants careful review—particularly the KeyError handling when the path doesn't exist in the response, and how it interacts with the existing status code comparison logic. Wdyt about whether the fallback chain (path → HTTP status → None) is the right precedence?
  • Type consistency: Verify that List[int] for status_codes_for_ratelimit_hit is consistently applied across all call sites and that existing code won't break due to the type change.
  • Integration flow: Confirm that the factory correctly wires the path_for_status_code through all instantiation paths and that no edge cases are missed where the field might be undefined.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat: attempt to rate limit on code that is within json body" is clearly related to the main change in the changeset. The PR's core purpose is to enable rate-limit detection based on status codes extracted from within the JSON response body via a configurable path, and the title accurately captures this functionality. The title is specific enough that a teammate reviewing git history would understand the primary change—it references both the rate-limiting feature and the JSON body aspect that distinguishes this PR's new capability. While the phrasing is somewhat informal, it successfully conveys the intended change without being vague or generic.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/rate_limit_status_code_in_json_body

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

PyTest Results (Fast)

3 817 tests  ±0   3 805 ✅ ±0   6m 27s ⏱️ -16s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 03352a3. ± Comparison against base commit 6504148.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
airbyte_cdk/sources/streams/call_rate.py (1)

638-652: Avoid mutable default arg for status codes and update docstring.

Using a mutable default list ([429]) risks shared-state bugs; can we switch to None and materialize an immutable/frozenset inside? Also, the docstring doesn’t mention path_for_status_code. Proposed minimal diff:

-def __init__(
-    self,
-    ratelimit_reset_header: str = "ratelimit-reset",
-    ratelimit_remaining_header: str = "ratelimit-remaining",
-    status_codes_for_ratelimit_hit: List[int] = [429],
-    path_for_status_code: Optional[List[str]] = None,
-    **kwargs: Any,
-):
+def __init__(
+    self,
+    ratelimit_reset_header: str = "ratelimit-reset",
+    ratelimit_remaining_header: str = "ratelimit-remaining",
+    status_codes_for_ratelimit_hit: Optional[List[int]] = None,
+    path_for_status_code: Optional[List[str]] = None,
+    **kwargs: Any,
+):
@@
-    :param status_codes_for_ratelimit_hit: list of HTTP status codes that signal about rate limit being hit
+    :param status_codes_for_ratelimit_hit: list of HTTP status codes that signal about rate limit being hit
+    :param path_for_status_code: JSON body path to a non-HTTP “status code” to also treat as rate-limited when matched
@@
-    self._status_codes_for_ratelimit_hit = status_codes_for_ratelimit_hit
+    # use an immutable set for safe, fast membership
+    self._status_codes_for_ratelimit_hit = frozenset(status_codes_for_ratelimit_hit or [429])

This keeps behavior the same but removes the mutable default risk and documents the new parameter. Wdyt?

🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

1672-1677: Consider adding examples and clarifying the description.

The new path_for_status_code field is structurally sound, but could benefit from improved documentation:

  1. The description "When the status code is not a HTTP status code but a code in the HTTP response" is a bit circular. How about rephrasing to something like: "Path to the status code field within the HTTP response body (for nested status codes)"?

  2. Similar path-based fields in the schema (e.g., session_token_path, schema_pointer) include examples. Would adding examples like ["error", "status"] or ["response", "code"] help users understand the expected format?

  3. Should this field support interpolation_context (e.g., to reference config values), like other optional fields in the schema?

What do you think—would clarifying the description and adding examples improve usability? wdyt?

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

2216-2220: Clarify path semantics and align naming with existing “*_path” fields?

  • Should this follow the repo’s existing naming pattern like status_code_path (cf. session_token_path), for consistency, wdyt?
  • The current type List[str] doesn’t cover array indices or wildcards; do we want to mirror the Dpath semantics (allow “*” or numeric indices as strings) and document it to avoid surprises, wdyt?
  • Minor: “a HTTP” → “an HTTP”.

Also, since this file is generated, can you confirm this change originates from the YAML (declarative_component_schema.yaml) so regeneration remains deterministic? Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6504148 and 03352a3.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
  • airbyte_cdk/sources/streams/call_rate.py (4 hunks)
  • pyproject.toml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.

Applied to files:

  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

4143-4157: LGTM! Field propagation is clean and consistent.

The addition of path_for_status_code to the HttpAPIBudget constructor follows the same pattern as other optional fields in this function. The implementation correctly passes the field from the model without transformation, which is appropriate for this factory method.

One optional consideration: Should there be validation that path_for_status_code, if provided, contains valid path segments (e.g., non-empty list, valid dpath syntax)? Currently, validation is deferred to the HttpAPIBudget class, which is fine if that's where the validation logic lives, wdyt?

#!/bin/bash
# Verify that HttpAPIBudget handles path_for_status_code correctly and performs appropriate validation

echo "=== Searching for HttpAPIBudget class definition and validation ==="
ast-grep --pattern $'class HttpAPIBudget {
  $$$
  __init__($$$, path_for_status_code$$$) {
    $$$
  }
  $$$
}'

echo -e "\n=== Checking for validation of path_for_status_code ==="
rg -A10 -B2 'path_for_status_code' --type py -g 'call_rate.py'
pyproject.toml (1)

121-121: dagger-io 0.19.3 is safe—but is the update necessary for this feature?

Good news: dagger-io 0.19.3 exists and has no known security vulnerabilities. However, the main question remains: is this dependency update required for the path-based rate-limit feature? If not, would you consider separating it into a dedicated PR to keep changes focused and easier to review? If it is needed, could you clarify the dependency in the PR description?

airbyte_cdk/sources/streams/call_rate.py (2)

8-8: Import looks fine.

New functools import is appropriate for the reduce usage. Nothing to change.


14-14: Typing import addition is fine.

Adding List is consistent with the new annotations.

Comment on lines +677 to 688
if self._path_for_status_code:
try:
if (
functools.reduce(lambda a, b: a[b], self._path_for_status_code, response.json())
in self._status_codes_for_ratelimit_hit
):
return 0
except KeyError:
# the status is not present in the response so we will assume that we're not being rate limited
pass
elif response.status_code in self._status_codes_for_ratelimit_hit:
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don’t skip HTTP-status fallback when a JSON path is configured; harden JSON/path handling.

With elif, a configured path_for_status_code suppresses the HTTP status check; a missing/unevaluable path could cause us to ignore a 429. Also, only KeyError is handled; JSON decoding or type/index errors may surface. Proposed minimal fix:

-        if self._path_for_status_code:
-            try:
-                if (
-                    functools.reduce(lambda a, b: a[b], self._path_for_status_code, response.json())
-                    in self._status_codes_for_ratelimit_hit
-                ):
-                    return 0
-            except KeyError:
-                # the status is not present in the response so we will assume that we're not being rate limited
-                pass
-        elif response.status_code in self._status_codes_for_ratelimit_hit:
+        if self._path_for_status_code:
+            try:
+                body = response.json()
+                value = functools.reduce(lambda a, b: a[b], self._path_for_status_code, body)
+                # normalize common stringified codes, e.g. "429"
+                if isinstance(value, str) and value.isdigit():
+                    value = int(value)
+                if value in self._status_codes_for_ratelimit_hit:
+                    return 0
+            except (KeyError, TypeError, ValueError, json.JSONDecodeError):
+                # path missing or body not JSON; fall back to HTTP status check below
+                pass
+        if response.status_code in self._status_codes_for_ratelimit_hit:
             return 0

And add the import near the top:

+import json

Optionally, do you want to support numeric list indices in the path (e.g., ["errors", "0", "code"]) by treating numeric tokens as indexes? Easy to add if helpful. Wdyt?

🤖 Prompt for AI Agents
In airbyte_cdk/sources/streams/call_rate.py around lines 677-688, the current
code uses an elif so when a path_for_status_code is configured the plain HTTP
status_code check is skipped, and it only catches KeyError which ignores JSON
decode/type/index problems; change the logic so the HTTP-status check is always
evaluated (use a separate if for the JSON-path branch, not elif), broaden the
exception handling around path evaluation to catch JSONDecodeError, TypeError,
IndexError (import JSONDecodeError from json at the top), and on any exception
fall back to evaluating response.status_code against
_status_codes_for_ratelimit_hit; optionally, if you want numeric path tokens to
select list indices, convert tokens that are numeric strings to ints before
indexing.

@github-actions
Copy link

PyTest Results (Full)

3 820 tests   3 808 ✅  11m 1s ⏱️
    1 suites     12 💤
    1 files        0 ❌

Results for commit 03352a3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants