-
Notifications
You must be signed in to change notification settings - Fork 15
Feature Request: unique=True column constraint #313
Copy link
Copy link
Open
Description
Currently, enforcing uniqueness on a column requires either:
primary_key=True- but when multiple columns have this, they form a composite key (uniqueness checked on the combination, not individually)- A custom
@dy.checkfunction - works, but is verbose for such a common constraint
There's no convenient inline way to express "this column must contain unique values" independently.
Example
import dataframely as dy
import polars as pl
class Users(dy.Schema):
id = dy.Int64(primary_key=True)
email = dy.String(primary_key=True) # Want unique emails, but this creates composite key (id, email)
username = dy.String()
# This passes validation, but shouldn't:
df = pl.DataFrame({
"id": [1, 2],
"email": ["a@x.com", "a@x.com"], # Duplicate email!
"username": ["alice", "alex"],
})
Users.validate(df) # No error - composite key (1, "a@x.com") != (2, "a@x.com")Proposed Solution
Add a unique=True parameter to column definitions:
class Users(dy.Schema):
id = dy.Int64(primary_key=True)
email = dy.String(unique=True) # Must be unique independently
username = dy.String(unique=True) # Must be unique independentlyThis would check each column's uniqueness separately from the primary key constraint, without requiring a custom check function.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels