cmccomb · cmccomb · Nov 15, 2025 · Nov 15, 2025 · Nov 15, 2025 · Nov 15, 2025
diff --git a/README.md b/README.md
@@ -94,6 +94,75 @@ Pipelines preserve the order of steps. Stateful steps such as PCA, SVD, or
 standardization automatically fit during training and reuse the same fitted
 state when you call `predict`.
 
+### Supported preprocessing steps
+
+The pipeline is intentionally modular so you can mix and match steps to
+mirror popular `AutoML` defaults:
+
+- **Scaling** – standard, min–max, or robust scaling via `ScaleParams`.
+- **Imputation** – mean, median, or most-frequent replacement with
+  `ImputeParams`.
+- **Categorical encoders** – ordinal or one-hot encoding with optional dummy
+  drop.
+- **Power transforms** – per-column log or Box-Cox transforms with automatic
+  shifting for strictly positive domains.
+- **Column filters** – select or exclude features using `ColumnSelector`
+  helpers.
+
+Each state stores the fitted statistics (e.g., medians, category mappings) so
+that the same transformation can be applied consistently during inference.
+
+### Example: AutoGluon-style defaults
+
+```rust, no_run
+use automl::settings::{
+    CategoricalEncoderParams, CategoricalEncoding, ColumnFilterParams, ColumnSelector,
+    ImputeParams, ImputeStrategy, PowerTransform, PowerTransformParams, PreprocessingPipeline,
+    PreprocessingStep, RobustScaleParams, ScaleParams, ScaleStrategy,
+};
+
+let pipeline = PreprocessingPipeline::new()
+    // Fill numeric columns with the median before scaling.
+    .add_step(PreprocessingStep::Impute(ImputeParams {
+        strategy: ImputeStrategy::Median,
+        selector: ColumnSelector::Include(vec![0, 1, 2]),
+    }))
+    // Apply a robust scaler to guard against outliers (similar to AutoGluon).
+    .add_step(PreprocessingStep::Scale(ScaleParams {
+        strategy: ScaleStrategy::Robust(RobustScaleParams::default()),
+        selector: ColumnSelector::Include(vec![0, 1, 2]),
+    }))
+    // One-hot encode categorical columns and drop the reference level.
+    .add_step(PreprocessingStep::EncodeCategorical(CategoricalEncoderParams {
+        selector: ColumnSelector::Include(vec![3, 4]),
+        encoding: CategoricalEncoding::one_hot(true),
+    }))
+    // Optionally keep only the engineered features.
+    .add_step(PreprocessingStep::FilterColumns(ColumnFilterParams {
+        selector: ColumnSelector::Include(vec![0, 1, 2, 5, 6]),
+        retain_selected: true,
+    }));
+```
+
+### Example: caret-style log + standardization recipe
+
+```rust, no_run
+use automl::settings::{
+    ColumnSelector, PowerTransform, PowerTransformParams, PreprocessingPipeline,
+    PreprocessingStep, ScaleParams, ScaleStrategy, StandardizeParams,
+};
+
+let caret_like = PreprocessingPipeline::new()
+    .add_step(PreprocessingStep::PowerTransform(PowerTransformParams {
+        selector: ColumnSelector::Include(vec![0]),
+        transform: PowerTransform::Log { offset: 0.0 },
+    }))
+    .add_step(PreprocessingStep::Scale(ScaleParams {
+        strategy: ScaleStrategy::Standard(StandardizeParams::default()),
+        selector: ColumnSelector::All,
+    }));
+```
+
 ## Features
 
 This crate has several features that add some additional methods.

diff --git a/src/model/mod.rs b/src/model/mod.rs
@@ -4,7 +4,7 @@ pub mod classification;
 pub mod clustering;
 mod comparison;
 pub mod error;
-mod preprocessing;
+pub mod preprocessing;
 pub mod regression;
 pub mod supervised;