Skip to content

Feature processor v3#5565

Open
BassemHalim wants to merge 23 commits intoaws:masterfrom
BassemHalim:feature-processor-v3
Open

Feature processor v3#5565
BassemHalim wants to merge 23 commits intoaws:masterfrom
BassemHalim:feature-processor-v3

Conversation

@BassemHalim
Copy link

@BassemHalim BassemHalim commented Feb 20, 2026

Port feature_processor module to SageMaker Python SDK v3

Migrates the feature_store.feature_processor module from sagemaker-python-sdk v2 to the v3 modular package structure (sagemaker-mlops), along with supporting changes in sagemaker-core.

Changes

Import path migration — All internal imports updated from sagemaker.feature_store.feature_processorsagemaker.mlops.feature_store.feature_processor, and external SDK dependencies remapped to their v3 locations:

  • sagemaker.Sessionsagemaker.core.helper.session_helper.Session
  • sagemaker.lineagesagemaker.core.lineage
  • sagemaker.remote_functionsagemaker.core.remote_function
  • sagemaker.workflowsagemaker.mlops.workflow
  • sagemaker.s3 / sagemaker.utils / sagemaker.vpc_utilssagemaker.core.*

Estimator → ModelTrainer migration (feature_scheduler.py) — Replaced the v2 Estimator dict-based construction with the v3 ModelTrainer API:

  • Uses Compute, Networking, StoppingCondition, SourceCode, OutputDataConfig, and Tag config objects
  • Creates a PipelineSession for pipeline-aware execution
  • TrainingStep now uses step_args from ModelTrainer.train() instead of estimator+inputs

Input channel format change (_config_uploader.py) — prepare_step_input now returns List[Channel] (using Channel/DataSource/S3DataSource shapes) instead of Dict[str, TrainingInput].

Session helper additions (sagemaker-core) — Added Feature Store methods to Session: delete_feature_group, describe_feature_group, create_feature_group, update_feature_group, and related config schema imports.

Diff with V2

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@BassemHalim BassemHalim changed the base branch from master to feature-store-v3 February 25, 2026 21:40
@BassemHalim BassemHalim changed the base branch from feature-store-v3 to master February 27, 2026 05:19
@BassemHalim BassemHalim reopened this Feb 27, 2026
Aditi2424 and others added 7 commits February 27, 2026 11:19
* feat: Add Feature Store Support to V3

* Add feature store tests

---------

Co-authored-by: adishaa <adishaa@amazon.com>
Aditi2424 and others added 2 commits February 27, 2026 15:58
* feat: Add Feature Store Support to V3

* Add feature store tests

---------

Co-authored-by: adishaa <adishaa@amazon.com>
…ondition

Isolate Ivy cache per Spark session via spark.jars.ivy to prevent concurrent pytest-xdist workers from corrupting shared /root/.ivy2/cache
during Maven dependency resolution in CI.
@BassemHalim BassemHalim marked this pull request as ready for review March 3, 2026 17:37
mollyheamazon
mollyheamazon previously approved these changes Mar 3, 2026
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have these file names starting with _

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are meant to be 'private' only imported from the feature_processor module. I ported them from v2 as-is. Do you prefer I rename them?

[testenv]
setenv =
PYTHONHASHSEED=42
JAVA_HOME={env:JAVA_HOME:/usr/lib/jvm/default-java}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this ?
Is java a requirement for mlops package to work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java is a requirement for just feature-processor because it requires pySpark which needs java ref

train = ["sagemaker-train"]
serve = ["sagemaker-serve"]
mlops = ["sagemaker-mlops"]
feature-processor = ["sagemaker-mlops", "pyspark==3.3.2", "sagemaker-feature-store-pyspark-3.3", "setuptools<82"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature-processor used to require an extra pip install pip install sagemaker[feature-processor] this is because it requires extra dependencies that don't make sense to require all sagemaker users to install docs

V2 requirements: https://github.com/aws/sagemaker-python-sdk/blob/master-v2/requirements/extras/feature-processor_requirements.txt
The extra setuptools<82 is because setuptools removed a module named pkg_resources https://setuptools.pypa.io/en/latest/history.html#v82-0-0

bhhalim added 2 commits March 3, 2026 12:03
- Replace sagemaker_session.describe_feature_group() calls with FeatureGroup.get()
- Update _input_loader.py to use FeatureGroup resource attributes instead of dictionary access
- Update feature_scheduler.py to use FeatureGroup.get() and access creation_time as attribute
- Update _feature_group_lineage_entity_handler.py to return FeatureGroup resource instead of Dict
- Remove unused imports (Dict, Any, FEATURE_GROUP, CREATION_TIME constants)
- Replace dictionary key access with typed resource properties (offline_store_config, data_catalog_config, event_time_feature_name, etc.)
- Update unit tests to reflect new FeatureGroup resource API usage
- Improves type safety and reduces reliance on dictionary-based API responses
@BassemHalim BassemHalim deployed to auto-approve March 3, 2026 22:31 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants