Skip to content

add support for azure blob for inference results#712

Open
shrek wants to merge 26 commits intoNVIDIA:mainfrom
shrek:add-azure-blob-support
Open

add support for azure blob for inference results#712
shrek wants to merge 26 commits intoNVIDIA:mainfrom
shrek:add-azure-blob-support

Conversation

@shrek
Copy link
Collaborator

@shrek shrek commented Feb 24, 2026

Earth2Studio Pull Request

Description

This PR adds the following functionality:

  • Adds support to save inference outputs to azure blob. This required adding additional configs for azure which is then passed onto the multi-storage-client which already supports azure.

  • Adds 2 new workflows based on fcn3. These workflows are tested to work on azure. Both zarr, and netcdf outputs from the workflows are tested. Also tested on azure is saving the inference output to a blob container.

  • Config is updated to filter the workflows that are exposed by API. This allows a deployment to support a subset of configured workflows.

The following bugs are also fixed:

  • configs were not getting initialized with overrides
  • startup script was cd'ing to a non-existent directory

Tests

The inference service container built from this branch was tested to work on azure as an online-endpoint, and to save inference results to azure container.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.
  • Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 24, 2026

Greptile Summary

This PR adds comprehensive Azure Blob Storage support for inference results alongside existing S3/CloudFront functionality, allowing deployments to choose their cloud storage provider.

Key Changes:

  • Added Azure Blob Storage integration with SAS URL generation in object_storage.py
  • Fixed config initialization bug where _config wasn't properly initialized with overrides (changed from AppConfig() to None)
  • Implemented workflow exposure filtering to control which workflows are available via API endpoints
  • Fixed startup script bug changing service/inferenceserver to serve/inferenceserver
  • Added two new FCN3-based workflows: foundry_fcn3 and foundry_fcn3_stormscope_goes

Architecture:
The Azure implementation follows the same pattern as S3, using the multi-storage-client library with a unified interface. Storage-specific logic is cleanly separated with storage_type parameter. The workflow exposure feature adds production-ready controls for limiting API access to specific workflows.

Confidence Score: 4/5

  • Safe to merge with minor review of Azure configuration in production deployments
  • Score reflects well-structured implementation with comprehensive Azure support, important bug fixes, and good separation of concerns. The PR adds significant functionality that has been tested according to the description. One point deducted due to the complexity of the changes and the fact that this adds a new cloud provider integration that will need careful configuration in production.
  • All core files look good. Pay attention to object_storage.py and config.py during deployment to ensure Azure credentials are properly configured.

Important Files Changed

Filename Overview
serve/inferenceserver/api_server/object_storage.py added comprehensive Azure Blob Storage support with SAS URL generation, alongside existing S3/CloudFront functionality
serve/inferenceserver/api_server/config.py fixed config initialization bug by changing _config to None and added WorkflowExposureConfig and Azure configuration parameters
serve/inferenceserver/api_server/cpu_worker.py integrated Azure storage configuration and added conditional logic to handle both S3 and Azure storage types in upload process
serve/inferenceserver/api_server/main.py added workflow exposure checks across all workflow endpoints to enforce WorkflowExposureConfig filtering
serve/inferenceserver/api_server/workflow/workflow.py implemented is_workflow_exposed() and updated list_workflows() to support workflow exposure filtering via configuration
serve/inferenceserver/scripts/startup.sh fixed directory path from service/inferenceserver to serve/inferenceserver and added AZUREML_MODEL_DIR support

Last reviewed commit: d30153c

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

18 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 24, 2026

Additional Comments (1)

serve/inferenceserver/api_server/object_storage.py
duplicated code block - this S3 Transfer Acceleration and CloudFront configuration logic appears twice (once inside the if storage_type == "s3" block at lines 246-265, and again here at lines 302-322), and should be removed from here

        # Import multi-storage-client

@shrek shrek changed the title [FEA] serve - add support for azure blob for inference results add support for azure blob for inference results in serve Feb 24, 2026
@shrek shrek marked this pull request as draft February 24, 2026 02:38
@shrek shrek changed the title add support for azure blob for inference results in serve add support for azure blob for inference results Feb 24, 2026
@shrek shrek requested a review from jleinonen February 25, 2026 23:39
@shrek shrek marked this pull request as ready for review February 26, 2026 01:01
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Additional Comments (1)

serve/inferenceserver/api_server/conf/config.yaml, line 91
missing Azure-specific fields in YAML config

The object_storage section is missing several fields that are documented in README_object_storage.md and defined in ObjectStorageConfig:

  • storage_type (should default to "s3")
  • azure_connection_string
  • azure_account_name
  • azure_account_key
  • azure_container_name

While environment variables can override these, users following the documentation will expect to find them in the YAML. Add them with null defaults:

object_storage:
  enabled: false
  storage_type: s3  # Storage provider: "s3" or "azure"
  bucket: null
  region: us-east-1
  prefix: outputs
  access_key_id: null
  secret_access_key: null
  session_token: null
  endpoint_url: null
  use_transfer_acceleration: true
  max_concurrency: 16
  multipart_chunksize: 8388608
  use_rust_client: true
  # CloudFront signed URL configuration
  cloudfront_domain: null
  cloudfront_key_pair_id: null
  cloudfront_private_key: null  # PEM private key content
  # Signed URL settings
  signed_url_expires_in: 86400  # 24 hours
  # Azure Blob Storage configuration
  azure_connection_string: null
  azure_account_name: null
  azure_account_key: null
  azure_container_name: null

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Additional Comments (3)

serve/inferenceserver/api_server/example_workflows/foundry_fcn3_stormscope_goes.py, line 123
Misleading error message for StormScope validation

The modulo check % (1 * 60 * 60) validates that time_stormscope falls on a 1-hour interval, but the error message says "must be 6-hour interval". This will confuse users when they get a validation error.

        if (time_stormscope - ref).total_seconds() % (1 * 60 * 60) != 0:
            raise ValueError(
                f"Start time for StormScope must be 1-hour interval: {time_stormscope}"
            )

serve/inferenceserver/api_server/cpu_worker.py, line 576
bucket may be None for Azure, violating type contract

When storage_type is "azure", users may only configure azure_container_name and leave bucket as None (the default in ObjectStorageConfig). However, MSCObjectStorage.__init__ declares bucket: str (non-optional), so passing None would violate the type contract and could cause downstream issues.

Consider defaulting to azure_container_name when bucket is not set:

            storage_kwargs = {
                "bucket": config.object_storage.bucket or config.object_storage.azure_container_name or "",
                "storage_type": config.object_storage.storage_type,

serve/inferenceserver/api_server/object_storage.py, line 294
AttributeError before ObjectStorageError

When azure_account_name is not provided and the connection string doesn't contain an AccountName key, the for loop on line 286 will complete without ever assigning self.azure_account_name. The guard on line 290 (if not self.azure_account_name) then raises an AttributeError instead of the intended ObjectStorageError.

Fix by initializing the attribute before the loop:

            if not azure_account_name:
                self.azure_account_name = None
                for part in azure_connection_string.split(";"):
                    if part.startswith("AccountName="):
                        self.azure_account_name = part.split("=", 1)[1]
                        break
                if not self.azure_account_name:
                    raise ObjectStorageError(
                        "Could not extract account name from connection string. "
                        "Please provide azure_account_name directly or ensure connection string contains AccountName."
                    )

@shrek
Copy link
Collaborator Author

shrek commented Feb 26, 2026

@greptile-ai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants