Conversation
* feat: Add Feature Store Support to V3 * Add feature store tests --------- Co-authored-by: adishaa <adishaa@amazon.com>
b8a776b to
6f4f490
Compare
* feat: Add Feature Store Support to V3 * Add feature store tests --------- Co-authored-by: adishaa <adishaa@amazon.com>
…ondition Isolate Ivy cache per Spark session via spark.jars.ivy to prevent concurrent pytest-xdist workers from corrupting shared /root/.ivy2/cache during Maven dependency resolution in CI.
…oncurrent writes in CI
There was a problem hiding this comment.
Why do we have these file names starting with _
There was a problem hiding this comment.
They are meant to be 'private' only imported from the feature_processor module. I ported them from v2 as-is. Do you prefer I rename them?
| [testenv] | ||
| setenv = | ||
| PYTHONHASHSEED=42 | ||
| JAVA_HOME={env:JAVA_HOME:/usr/lib/jvm/default-java} |
There was a problem hiding this comment.
Why do we need this ?
Is java a requirement for mlops package to work?
There was a problem hiding this comment.
Java is a requirement for just feature-processor because it requires pySpark which needs java ref
| train = ["sagemaker-train"] | ||
| serve = ["sagemaker-serve"] | ||
| mlops = ["sagemaker-mlops"] | ||
| feature-processor = ["sagemaker-mlops", "pyspark==3.3.2", "sagemaker-feature-store-pyspark-3.3", "setuptools<82"] |
There was a problem hiding this comment.
feature-processor used to require an extra pip install pip install sagemaker[feature-processor] this is because it requires extra dependencies that don't make sense to require all sagemaker users to install docs
V2 requirements: https://github.com/aws/sagemaker-python-sdk/blob/master-v2/requirements/extras/feature-processor_requirements.txt
The extra setuptools<82 is because setuptools removed a module named pkg_resources https://setuptools.pypa.io/en/latest/history.html#v82-0-0
- Replace sagemaker_session.describe_feature_group() calls with FeatureGroup.get() - Update _input_loader.py to use FeatureGroup resource attributes instead of dictionary access - Update feature_scheduler.py to use FeatureGroup.get() and access creation_time as attribute - Update _feature_group_lineage_entity_handler.py to return FeatureGroup resource instead of Dict - Remove unused imports (Dict, Any, FEATURE_GROUP, CREATION_TIME constants) - Replace dictionary key access with typed resource properties (offline_store_config, data_catalog_config, event_time_feature_name, etc.) - Update unit tests to reflect new FeatureGroup resource API usage - Improves type safety and reduces reliance on dictionary-based API responses
Port
feature_processormodule to SageMaker Python SDK v3Migrates the
feature_store.feature_processormodule fromsagemaker-python-sdkv2 to the v3 modular package structure (sagemaker-mlops), along with supporting changes insagemaker-core.Changes
Import path migration — All internal imports updated from
sagemaker.feature_store.feature_processor→sagemaker.mlops.feature_store.feature_processor, and external SDK dependencies remapped to their v3 locations:sagemaker.Session→sagemaker.core.helper.session_helper.Sessionsagemaker.lineage→sagemaker.core.lineagesagemaker.remote_function→sagemaker.core.remote_functionsagemaker.workflow→sagemaker.mlops.workflowsagemaker.s3/sagemaker.utils/sagemaker.vpc_utils→sagemaker.core.*Estimator → ModelTrainer migration (
feature_scheduler.py) — Replaced the v2Estimatordict-based construction with the v3ModelTrainerAPI:Compute,Networking,StoppingCondition,SourceCode,OutputDataConfig, andTagconfig objectsPipelineSessionfor pipeline-aware executionTrainingStepnow usesstep_argsfromModelTrainer.train()instead ofestimator+inputsInput channel format change (
_config_uploader.py) —prepare_step_inputnow returnsList[Channel](usingChannel/DataSource/S3DataSourceshapes) instead ofDict[str, TrainingInput].Session helper additions (
sagemaker-core) — Added Feature Store methods toSession:delete_feature_group,describe_feature_group,create_feature_group,update_feature_group, and related config schema imports.Diff with V2
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.