Skip to content

Feature Request: unique=True column constraint #313

@gab23r

Description

@gab23r

Currently, enforcing uniqueness on a column requires either:

  1. primary_key=True - but when multiple columns have this, they form a composite key (uniqueness checked on the combination, not individually)
  2. A custom @dy.check function - works, but is verbose for such a common constraint

There's no convenient inline way to express "this column must contain unique values" independently.

Example

import dataframely as dy
import polars as pl

class Users(dy.Schema):
    id = dy.Int64(primary_key=True)
    email = dy.String(primary_key=True)  # Want unique emails, but this creates composite key (id, email)
    username = dy.String()

# This passes validation, but shouldn't:
df = pl.DataFrame({
    "id": [1, 2],
    "email": ["a@x.com", "a@x.com"],  # Duplicate email!
    "username": ["alice", "alex"], 
})
Users.validate(df)  # No error - composite key (1, "a@x.com") != (2, "a@x.com")

Proposed Solution

Add a unique=True parameter to column definitions:

class Users(dy.Schema):
    id = dy.Int64(primary_key=True)
    email = dy.String(unique=True)     # Must be unique independently
    username = dy.String(unique=True)  # Must be unique independently

This would check each column's uniqueness separately from the primary key constraint, without requiring a custom check function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions