Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,75 @@ Pipelines preserve the order of steps. Stateful steps such as PCA, SVD, or
standardization automatically fit during training and reuse the same fitted
state when you call `predict`.

### Supported preprocessing steps

The pipeline is intentionally modular so you can mix and match steps to
mirror popular `AutoML` defaults:

- **Scaling** – standard, min–max, or robust scaling via `ScaleParams`.
- **Imputation** – mean, median, or most-frequent replacement with
`ImputeParams`.
- **Categorical encoders** – ordinal or one-hot encoding with optional dummy
drop.
- **Power transforms** – per-column log or Box-Cox transforms with automatic
shifting for strictly positive domains.
- **Column filters** – select or exclude features using `ColumnSelector`
helpers.

Each state stores the fitted statistics (e.g., medians, category mappings) so
that the same transformation can be applied consistently during inference.

### Example: AutoGluon-style defaults

```rust, no_run
use automl::settings::{
CategoricalEncoderParams, CategoricalEncoding, ColumnFilterParams, ColumnSelector,
ImputeParams, ImputeStrategy, PowerTransform, PowerTransformParams, PreprocessingPipeline,
PreprocessingStep, RobustScaleParams, ScaleParams, ScaleStrategy,
};

let pipeline = PreprocessingPipeline::new()
// Fill numeric columns with the median before scaling.
.add_step(PreprocessingStep::Impute(ImputeParams {
strategy: ImputeStrategy::Median,
selector: ColumnSelector::Include(vec![0, 1, 2]),
}))
// Apply a robust scaler to guard against outliers (similar to AutoGluon).
.add_step(PreprocessingStep::Scale(ScaleParams {
strategy: ScaleStrategy::Robust(RobustScaleParams::default()),
selector: ColumnSelector::Include(vec![0, 1, 2]),
}))
// One-hot encode categorical columns and drop the reference level.
.add_step(PreprocessingStep::EncodeCategorical(CategoricalEncoderParams {
selector: ColumnSelector::Include(vec![3, 4]),
encoding: CategoricalEncoding::one_hot(true),
}))
// Optionally keep only the engineered features.
.add_step(PreprocessingStep::FilterColumns(ColumnFilterParams {
selector: ColumnSelector::Include(vec![0, 1, 2, 5, 6]),
retain_selected: true,
}));
```

### Example: caret-style log + standardization recipe

```rust, no_run
use automl::settings::{
ColumnSelector, PowerTransform, PowerTransformParams, PreprocessingPipeline,
PreprocessingStep, ScaleParams, ScaleStrategy, StandardizeParams,
};

let caret_like = PreprocessingPipeline::new()
.add_step(PreprocessingStep::PowerTransform(PowerTransformParams {
selector: ColumnSelector::Include(vec![0]),
transform: PowerTransform::Log { offset: 0.0 },
}))
.add_step(PreprocessingStep::Scale(ScaleParams {
strategy: ScaleStrategy::Standard(StandardizeParams::default()),
selector: ColumnSelector::All,
}));
```

## Features

This crate has several features that add some additional methods.
Expand Down
2 changes: 1 addition & 1 deletion src/model/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ pub mod classification;
pub mod clustering;
mod comparison;
pub mod error;
mod preprocessing;
pub mod preprocessing;
pub mod regression;
pub mod supervised;

Expand Down
Loading