Skip to content

[ISSUE] TypeError in _unknown_error() when API returns unparseable error on streaming request #1264

@davwil

Description

@davwil

Description

_unknown_error() in sdk/errors/parser.py crashes with TypeError: object of type '_io.BytesIO' has no len() when the API returns an unparseable error response on a request that had a data parameter.

The bug is self-contained within the SDK: _BaseClient.do() wraps all data in BytesIO for retry/seek support (line 164-166 of _base_client.py), but _unknown_error() creates RoundTrip(raw=False), which calls _redacted_dump()
→ len(body) on the BytesIO request body. This masks the actual API error with a TypeError, making production issues impossible to diagnose.

The normal request logging path in _base_client.py:295 handles this correctly by passing raw=True when data is not None, but _unknown_error() doesn't.

Reproduction

 from unittest.mock import MagicMock

  import requests
  from databricks.sdk._base_client import _BaseClient


  def test_do_crashes_when_api_returns_unparseable_error():
      client = _BaseClient(retry_timeout_seconds=1)

      def mock_request(method, url, **kwargs):
          prep = requests.PreparedRequest()
          prep.method = method
          prep.prepare_url(url, kwargs.get("params"))
          prep.prepare_headers(kwargs.get("headers"))
          prep.prepare_body(data=kwargs.get("data"), files=kwargs.get("files"), json=kwargs.get("json"))

          response = requests.Response()
          response.status_code = 500
          response._content = b"\x00\x01 binary garbage that no parser can handle"
          response.request = prep
          return response

      client._session.request = MagicMock(side_effect=mock_request)

      # Pass plain bytes — the SDK wraps them in BytesIO (line 164-166), then crashes on it.
      client.do("PUT", "https://example.com/api/2.0/fs/files/test.json", data=b'{"key": "value"}')
      # TypeError: object of type '_io.BytesIO' has no len()


  if __name__ == "__main__":
      test_do_crashes_when_api_returns_unparseable_error()

Expected behavior
When the API returns an unparseable error, the SDK should surface the actual HTTP error (status code, response body) instead of crashing with a TypeError about BytesIO.

The fix is in _unknown_error() (parser.py:39): RoundTrip should be created with raw=True when request.body is not a str, or _redacted_dump() should handle non-string body types gracefully.

Is it a regression?
Unknown. The BytesIO wrapping in do() and the _unknown_error() fallback path appear to have been present for multiple versions. The bug surfaces only when the API returns a response that none of the standard error parsers
can handle, which may be rare in practice.

Debug Logs
the error occurs before the SDK can produce a debug log for the failed request.

Other Information

  • OS: Windows 11
  • Version: 0.67.0

Additional context
call chain:

  1. _BaseClient.do() wraps data in BytesIO (line 164-166 of _base_client.py)
  2. _perform() sends the request, calls _record_request_log() with raw=True (works fine)
  3. get_api_error() finds response.ok == False, tries all error parsers, none can parse it
  4. Falls back to _unknown_error() which creates RoundTrip(raw=False) (line 39 of parser.py)
  5. RoundTrip.generate() calls _redacted_dump("> ", request.body) (line 47 of round_trip_logger.py)
  6. _redacted_dump() calls len(body) on the BytesIO → TypeError

Also note: the error message in _unknown_error() points users to databricks-sdk-go/issues instead of databricks-sdk-py/issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions