Skip to content

fix: Ignore dy.Any columns in Schema.cast#315

Open
gab23r wants to merge 1 commit intoQuantco:mainfrom
gab23r:fix-cast-with-any
Open

fix: Ignore dy.Any columns in Schema.cast#315
gab23r wants to merge 1 commit intoQuantco:mainfrom
gab23r:fix-cast-with-any

Conversation

@gab23r
Copy link
Copy Markdown
Contributor

@gab23r gab23r commented Apr 3, 2026

Motivation

Fixes: #314

Changes

Schema.cast ignores the dy.Any columns.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issue #314 by preventing dy.Any columns from being cast during schema validation. The Schema.cast method now skips casting for dy.Any columns since they should accept any data type, rather than attempting to cast to their default pl.Null() type. This resolves the error that occurred when roundtripping collections with parquet files containing dy.Any columns.

Changes:

  • Modified Schema.cast method to check if a column is of type dy.Any and skip casting for such columns
  • Added import for the Any column type as AnyColumn in schema.py
  • Added a test to verify that casting a DataFrame with an Any column preserves the original dtype

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
dataframely/schema.py Added import for Any column type and modified cast method to skip casting for dy.Any columns
tests/column_types/test_any.py Added test verifying that Schema.cast preserves dtype for dy.Any columns
Comments suppressed due to low confidence (1)

tests/column_types/test_any.py:5

  • An extra blank line is added after the license header (line 4), creating two blank lines between the license header and imports. This doesn't match the convention in other test files (e.g., test_string.py) which have only one blank line. Consider removing the extra blank line to maintain consistency.

from typing import Any

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (72fb1a6) to head (378f7c1).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #315   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           56        56           
  Lines         3218      3234   +16     
=========================================
+ Hits          3218      3234   +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this and providing a fix! I'd just like to refactor it a little

lf = df.lazy().select(
pl.col(name).cast(col.dtype) for name, col in cls.columns().items()
# Skip casting for Any columns since they accept any type
pl.col(name) if isinstance(col, AnyColumn) else pl.col(name).cast(col.dtype)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of building special treatment for Any here, how about we move ownership of casting into the Column itself? I.e. Column gets a cast method, and the default implementation is:

def cast(self, col: pl.Expr) -> pl.Expr:
    return col.cast(self.dtype)

In Any, we then implement the override:

def cast(self, col: pl.Expr) -> pl.Expr:
    return col

I think this would be neat because you never have to think about special casting logic outside the column implementations themselves

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this solution as well, and then I thought about the dy.Integer column. How should we manage this case ? Maybe this Column.cast function should take as well the type of the input expression, meaning that we need to wrapped it with pipe_with_schema.
I am away from the computer right now, I can have a deeper look on Tuesday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Roundtrip a collection with parquet fails with dy.Any

3 participants