@@ -94,6 +94,75 @@ Pipelines preserve the order of steps. Stateful steps such as PCA, SVD, or
9494standardization automatically fit during training and reuse the same fitted
9595state when you call ` predict ` .
9696
97+ ### Supported preprocessing steps
98+
99+ The pipeline is intentionally modular so you can mix and match steps to
100+ mirror popular ` AutoML ` defaults:
101+
102+ - ** Scaling** – standard, min–max, or robust scaling via ` ScaleParams ` .
103+ - ** Imputation** – mean, median, or most-frequent replacement with
104+ ` ImputeParams ` .
105+ - ** Categorical encoders** – ordinal or one-hot encoding with optional dummy
106+ drop.
107+ - ** Power transforms** – per-column log or Box-Cox transforms with automatic
108+ shifting for strictly positive domains.
109+ - ** Column filters** – select or exclude features using ` ColumnSelector `
110+ helpers.
111+
112+ Each state stores the fitted statistics (e.g., medians, category mappings) so
113+ that the same transformation can be applied consistently during inference.
114+
115+ ### Example: AutoGluon-style defaults
116+
117+ ``` rust, no_run
118+ use automl::settings::{
119+ CategoricalEncoderParams, CategoricalEncoding, ColumnFilterParams, ColumnSelector,
120+ ImputeParams, ImputeStrategy, PowerTransform, PowerTransformParams, PreprocessingPipeline,
121+ PreprocessingStep, RobustScaleParams, ScaleParams, ScaleStrategy,
122+ };
123+
124+ let pipeline = PreprocessingPipeline::new()
125+ // Fill numeric columns with the median before scaling.
126+ .add_step(PreprocessingStep::Impute(ImputeParams {
127+ strategy: ImputeStrategy::Median,
128+ selector: ColumnSelector::Include(vec![0, 1, 2]),
129+ }))
130+ // Apply a robust scaler to guard against outliers (similar to AutoGluon).
131+ .add_step(PreprocessingStep::Scale(ScaleParams {
132+ strategy: ScaleStrategy::Robust(RobustScaleParams::default()),
133+ selector: ColumnSelector::Include(vec![0, 1, 2]),
134+ }))
135+ // One-hot encode categorical columns and drop the reference level.
136+ .add_step(PreprocessingStep::EncodeCategorical(CategoricalEncoderParams {
137+ selector: ColumnSelector::Include(vec![3, 4]),
138+ encoding: CategoricalEncoding::one_hot(true),
139+ }))
140+ // Optionally keep only the engineered features.
141+ .add_step(PreprocessingStep::FilterColumns(ColumnFilterParams {
142+ selector: ColumnSelector::Include(vec![0, 1, 2, 5, 6]),
143+ retain_selected: true,
144+ }));
145+ ```
146+
147+ ### Example: caret-style log + standardization recipe
148+
149+ ``` rust, no_run
150+ use automl::settings::{
151+ ColumnSelector, PowerTransform, PowerTransformParams, PreprocessingPipeline,
152+ PreprocessingStep, ScaleParams, ScaleStrategy, StandardizeParams,
153+ };
154+
155+ let caret_like = PreprocessingPipeline::new()
156+ .add_step(PreprocessingStep::PowerTransform(PowerTransformParams {
157+ selector: ColumnSelector::Include(vec![0]),
158+ transform: PowerTransform::Log { offset: 0.0 },
159+ }))
160+ .add_step(PreprocessingStep::Scale(ScaleParams {
161+ strategy: ScaleStrategy::Standard(StandardizeParams::default()),
162+ selector: ColumnSelector::All,
163+ }));
164+ ```
165+
97166## Features
98167
99168This crate has several features that add some additional methods.
0 commit comments