From b025e4d00157791f6f170de8921ca4f0d5f9243a Mon Sep 17 00:00:00 2001
From: Marie Coolsaet <marie.coolsaet@snowflake.com>
Date: Wed, 19 Nov 2025 10:28:40 -0500
Subject: [PATCH] Add hpo tuning with experiment tracking assets

---
 samples/ml/experiment_tracking_hpo/README.md  | 127 ++++
 .../hpo_experiment_tracking.ipynb             | 645 ++++++++++++++++++
 2 files changed, 772 insertions(+)
 create mode 100644 samples/ml/experiment_tracking_hpo/README.md
 create mode 100644 samples/ml/experiment_tracking_hpo/hpo_experiment_tracking.ipynb

diff --git a/samples/ml/experiment_tracking_hpo/README.md b/samples/ml/experiment_tracking_hpo/README.md
new file mode 100644
index 00000000..273d76e3
--- /dev/null
+++ b/samples/ml/experiment_tracking_hpo/README.md
@@ -0,0 +1,127 @@
+# Distributed Hyperparameter Optimization and Experiment Tracking in Snowflake
+
+This example demonstrates how to combine two integrated Snowflake ML capabilities:
+
+- **Distributed Hyperparameter Optimization (HPO)** – Run model tuning in parallel on Snowpark Container Runtime  
+- **Experiment Tracking** – Automatically log parameters, metrics, and model artifacts for every run  
+
+Together, these tools let you move from one-off experiments to scalable, reproducible ML workflows — all within Snowflake.
+
+---
+
+## Overview
+
+**Challenges addressed**
+- Sequential hyperparameter tuning is slow  
+- Manual experiment tracking is error-prone  
+- Distributed infrastructure setup is complex  
+- Reproducing past experiments requires detailed documentation  
+
+**What you’ll learn**
+- Setting up experiment tracking for ML runs  
+- Running distributed HPO across multiple nodes  
+- Logging and comparing experiment results  
+- Viewing experiment history in Snowsight  
+
+---
+
+## Example Flow
+
+### 1. Define the Training Function
+
+Each HPO trial trains a model and logs its run.
+
+```python
+def train_function():
+    tuner_context = tune.get_tuner_context()
+    params = tuner_context.get_hyper_params()
+
+    exp = ExperimentTracking(session=get_active_session())
+    exp.set_experiment("Wine_Quality_Classification")
+
+    with exp.start_run():
+        exp.log_params(params)
+        model = XGBClassifier(**params)
+        model.fit(X_train, y_train)
+        
+        y_val_pred = model.predict(X_val)
+        y_val_proba = model.predict_proba(X_val)[:, 1]
+        
+        val_metrics = {
+            "val_accuracy": metrics.accuracy_score(y_val, y_val_pred),
+            "val_precision": metrics.precision_score(y_val, y_val_pred, zero_division=0),
+            "val_recall": metrics.recall_score(y_val, y_val_pred, zero_division=0),
+            "val_f1": metrics.f1_score(y_val, y_val_pred, zero_division=0),
+            "val_roc_auc": metrics.roc_auc_score(y_val, y_val_proba)
+        }
+        
+        exp.log_metrics(val_metrics)
+        tuner_context.report(metrics=val_metrics, model=model)
+```
+---
+
+### 2. Configure the Search Space
+
+```python
+search_space = {
+    "n_estimators": tune.randint(50, 300),
+    "max_depth": tune.randint(3, 15),
+    "learning_rate": tune.loguniform(0.01, 0.3),
+    "subsample": tune.uniform(0.5, 1.0),
+    "colsample_bytree": tune.uniform(0.5, 1.0)
+}
+
+tuner_config = tune.TunerConfig(
+    metric="f1_score",
+    mode="max",
+    search_alg=RandomSearch(),
+    num_trials=50
+)
+```
+---
+
+### 3. Run Distributed Tuning
+
+```python
+tuner = tune.Tuner(
+    train_func=train_function,
+    search_space=search_space,
+    tuner_config=tuner_config
+)
+
+scale_cluster(10)  # Scale out to multiple nodes
+results = tuner.run(dataset_map=dataset_map)
+```
+Each container runs one trial in parallel and logs metrics to Experiment Tracking.
+
+---
+
+### 4. View and Compare Results
+
+- In **Snowsight → AI & ML → Experiments**, select the experiment to:
+  - Compare runs
+  - View metrics and charts
+  - Inspect logged models and artifacts
+
+---
+
+## Prerequisites
+
+- Snowflake account with a database and schema
+- CREATE EXPERIMENT privilege on your schema
+- snowflake-ml-python >= 1.9.1
+- Notebook configured for Container Runtime on SPCS (Compute Pool with instance type `CPU_X64_S`)
+
+---
+
+## Resources
+
+- https://docs.snowflake.com/en/developer-guide/snowpark-ml/overview
+- https://docs.snowflake.com/en/developer-guide/snowpark-ml/experiment-tracking
+- https://docs.snowflake.com/en/developer-guide/snowpark-ml/container-hpo
+
+---
+
+## License
+
+Provided as-is for educational use.
diff --git a/samples/ml/experiment_tracking_hpo/hpo_experiment_tracking.ipynb b/samples/ml/experiment_tracking_hpo/hpo_experiment_tracking.ipynb
new file mode 100644
index 00000000..428b1359
--- /dev/null
+++ b/samples/ml/experiment_tracking_hpo/hpo_experiment_tracking.ipynb
@@ -0,0 +1,645 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000000",
+   "metadata": {
+    "name": "cell1"
+   },
+   "source": [
+    "# Distributed Hyperparameter Tuning with Experiment Tracking in Snowflake\n",
+    "\n",
+    "This notebook demonstrates how to use Snowflake's ML capabilities for:\n",
+    "1. **Experiment Tracking** - Log parameters, metrics, and models\n",
+    "2. **Distributed HPO** - Parallel hyperparameter optimization at scale\n",
+    "3. **Container Runtime** - Leverage Snowpark Container Services for ML workloads\n",
+    "\n",
+    "We'll build a classification model using the Wine Quality dataset and optimize it using distributed hyperparameter tuning while tracking all experiments in Snowflake.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000001",
+   "metadata": {
+    "collapsed": false,
+    "name": "cell2"
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "- Snowflake account with a database and schema\n",
+    "- CREATE EXPERIMENT privilege on your schema\n",
+    "- snowflake-ml-python >= 1.9.1\n",
+    "- Notebook configured for Container Runtime on SPCS (Compute Pool with instance type `CPU_X64_S`)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000002",
+   "metadata": {
+    "name": "cell3"
+   },
+   "source": [
+    "## Step 1: Setup and Data Loading\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce110000-1111-2222-3333-ffffff000003",
+   "metadata": {
+    "language": "python",
+    "name": "cell4"
+   },
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "from datetime import datetime\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "from sklearn import metrics\n",
+    "from xgboost import XGBClassifier\n",
+    "\n",
+    "from snowflake.snowpark.context import get_active_session\n",
+    "from snowflake.snowpark import Session\n",
+    "from snowflake.ml.experiment.experiment_tracking import ExperimentTracking\n",
+    "from snowflake.ml.modeling import tune\n",
+    "from snowflake.ml.modeling.tune.search import RandomSearch, BayesOpt\n",
+    "from snowflake.ml.data.data_connector import DataConnector\n",
+    "from snowflake.ml.runtime_cluster import scale_cluster\n",
+    "\n",
+    "# Get active Snowflake session\n",
+    "session = get_active_session()\n",
+    "print(f\"Connected to Snowflake: {session.get_current_database()}.{session.get_current_schema()}\")\n",
+    "\n",
+    "# Create dated experiment name for tracking runs over time\n",
+    "experiment_date = datetime.now().strftime(\"%Y%m%d\")\n",
+    "experiment_name = f\"Wine_Quality_Classification_{experiment_date}\"\n",
+    "print(f\"\\nExperiment Name: {experiment_name}\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000004",
+   "metadata": {
+    "name": "cell5"
+   },
+   "source": [
+    "### Generate Wine Quality Classification Dataset\n",
+    "\n",
+    "We'll create a synthetic dataset inspired by wine quality prediction. The goal is to classify wines as high quality (1) or standard quality (0) based on chemical properties.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce110000-1111-2222-3333-ffffff000005",
+   "metadata": {
+    "language": "python",
+    "name": "cell6"
+   },
+   "outputs": [],
+   "source": [
+    "# Generate synthetic wine quality dataset\n",
+    "np.random.seed(42)\n",
+    "n_samples = 20000\n",
+    "\n",
+    "# Feature generation with realistic correlations\n",
+    "data = {\n",
+    "    \"FIXED_ACIDITY\": np.random.normal(7.0, 1.5, n_samples),\n",
+    "    \"VOLATILE_ACIDITY\": np.random.gamma(2, 0.2, n_samples),\n",
+    "    \"CITRIC_ACID\": np.random.beta(2, 5, n_samples),\n",
+    "    \"RESIDUAL_SUGAR\": np.random.lognormal(1, 0.8, n_samples),\n",
+    "    \"CHLORIDES\": np.random.gamma(3, 0.02, n_samples),\n",
+    "    \"FREE_SULFUR_DIOXIDE\": np.random.normal(30, 15, n_samples),\n",
+    "    \"TOTAL_SULFUR_DIOXIDE\": np.random.normal(120, 40, n_samples),\n",
+    "    \"DENSITY\": np.random.normal(0.997, 0.003, n_samples),\n",
+    "    \"PH\": np.random.normal(3.2, 0.3, n_samples),\n",
+    "    \"SULPHATES\": np.random.gamma(4, 0.15, n_samples),\n",
+    "    \"ALCOHOL\": np.random.normal(10.5, 1.5, n_samples)\n",
+    "}\n",
+    "\n",
+    "df = pd.DataFrame(data)\n",
+    "\n",
+    "# Create quality target based on feature combinations\n",
+    "quality_score = (\n",
+    "    0.3 * (df[\"ALCOHOL\"] - df[\"ALCOHOL\"].mean()) / df[\"ALCOHOL\"].std() +\n",
+    "    0.2 * (df[\"CITRIC_ACID\"] - df[\"CITRIC_ACID\"].mean()) / df[\"CITRIC_ACID\"].std() -\n",
+    "    0.25 * (df[\"VOLATILE_ACIDITY\"] - df[\"VOLATILE_ACIDITY\"].mean()) / df[\"VOLATILE_ACIDITY\"].std() +\n",
+    "    0.15 * (df[\"SULPHATES\"] - df[\"SULPHATES\"].mean()) / df[\"SULPHATES\"].std() +\n",
+    "    np.random.normal(0, 0.3, n_samples)  # Add noise\n",
+    ")\n",
+    "\n",
+    "# Binary classification: 1 = high quality, 0 = standard quality\n",
+    "df[\"QUALITY\"] = (quality_score > quality_score.quantile(0.6)).astype(int)\n",
+    "\n",
+    "print(f\"Dataset shape: {df.shape}\")\n",
+    "print(f\"\\nClass distribution:\\n{df['QUALITY'].value_counts()}\")\n",
+    "print(f\"\\nFeature statistics:\\n{df.describe()}\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000006",
+   "metadata": {
+    "name": "cell7"
+   },
+   "source": [
+    "### Prepare Train/Validation/Test Splits\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce110000-1111-2222-3333-ffffff000007",
+   "metadata": {
+    "language": "python",
+    "name": "cell8"
+   },
+   "outputs": [],
+   "source": [
+    "# Separate features and target\n",
+    "X = df.drop('QUALITY', axis=1)\n",
+    "y = df['QUALITY']\n",
+    "\n",
+    "# Create train/val/test splits\n",
+    "X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.15, random_state=42, stratify=y)\n",
+    "X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.18, random_state=42, stratify=y_temp)\n",
+    "\n",
+    "# Scale features\n",
+    "scaler = StandardScaler()\n",
+    "X_train_scaled = pd.DataFrame(scaler.fit_transform(X_train), columns=X_train.columns)\n",
+    "X_val_scaled = pd.DataFrame(scaler.transform(X_val), columns=X_val.columns)\n",
+    "X_test_scaled = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns)\n",
+    "\n",
+    "print(f\"Training set: {X_train_scaled.shape[0]} samples\")\n",
+    "print(f\"Validation set: {X_val_scaled.shape[0]} samples\")\n",
+    "print(f\"Test set: {X_test_scaled.shape[0]} samples\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000008",
+   "metadata": {
+    "name": "cell9"
+   },
+   "source": [
+    "## Step 2: Baseline Model with Experiment Tracking\n",
+    "\n",
+    "Before running distributed HPO, let's train a baseline model and log it to Snowflake Experiment Tracking.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce110000-1111-2222-3333-ffffff000009",
+   "metadata": {
+    "language": "python",
+    "name": "cell10"
+   },
+   "outputs": [],
+   "source": [
+    "# Initialize Experiment Tracking\n",
+    "exp = ExperimentTracking(session=session)\n",
+    "exp.set_experiment(experiment_name)\n",
+    "\n",
+    "# Note: Snowflake supports autologging for certain ML frameworks, but this example uses \n",
+    "# explicit logging (exp.log_params, exp.log_metrics) to demonstrate a framework-agnostic \n",
+    "# approach. Explicit logging works with any ML library (scikit-learn, XGBoost, PyTorch, \n",
+    "# TensorFlow, custom frameworks, etc.) and gives you precise control over what gets logged, \n",
+    "# without requiring integration with Snowflake's modeling APIs.\n",
+    "\n",
+    "# Train baseline model\n",
+    "with exp.start_run(run_name=\"baseline_xgboost\") as run:\n",
+    "    # Define baseline parameters\n",
+    "    baseline_params = {\n",
+    "        'n_estimators': 100,\n",
+    "        'max_depth': 6,\n",
+    "        'learning_rate': 0.1,\n",
+    "        'subsample': 0.8,\n",
+    "        'colsample_bytree': 0.8,\n",
+    "        'gamma': 0.1,\n",
+    "        'min_child_weight': 8,\n",
+    "        'random_state': 42,\n",
+    "    }\n",
+    "    \n",
+    "    # Log parameters\n",
+    "    exp.log_params(baseline_params)\n",
+    "    \n",
+    "    # Train model\n",
+    "    baseline_model = XGBClassifier(**baseline_params)\n",
+    "    baseline_model.fit(X_train_scaled, y_train)\n",
+    "    \n",
+    "    # Evaluate on validation set\n",
+    "    y_val_pred = baseline_model.predict(X_val_scaled)\n",
+    "    y_val_proba = baseline_model.predict_proba(X_val_scaled)[:, 1]\n",
+    "    \n",
+    "    # Calculate metrics\n",
+    "    val_metrics = {\n",
+    "        'val_accuracy': metrics.accuracy_score(y_val, y_val_pred),\n",
+    "        'val_precision': metrics.precision_score(y_val, y_val_pred),\n",
+    "        'val_recall': metrics.recall_score(y_val, y_val_pred),\n",
+    "        'val_f1': metrics.f1_score(y_val, y_val_pred),\n",
+    "        'val_roc_auc': metrics.roc_auc_score(y_val, y_val_proba)\n",
+    "    }\n",
+    "    \n",
+    "    # Log metrics\n",
+    "    exp.log_metrics(val_metrics)\n",
+    "    \n",
+    "    print(\"Baseline Model Performance:\")\n",
+    "    for metric, value in val_metrics.items():\n",
+    "        print(f\"  {metric}: {value:.4f}\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000010",
+   "metadata": {
+    "collapsed": false,
+    "name": "cell11"
+   },
+   "source": [
+    "## Step 3: Distributed Hyperparameter Optimization\n",
+    "\n",
+    "Now we'll use Snowflake's distributed HPO capabilities to find optimal hyperparameters. The HPO workload will:\n",
+    "- Scale across multiple nodes in the SPCS compute pool\n",
+    "- Run trials in parallel for faster optimization\n",
+    "- Automatically log all trials to Experiment Tracking\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000011",
+   "metadata": {
+    "name": "cell12"
+   },
+   "source": [
+    "### Prepare Data Connectors\n",
+    "\n",
+    "Convert our pandas DataFrames to Snowflake DataConnectors for distributed processing.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce110000-1111-2222-3333-ffffff000012",
+   "metadata": {
+    "language": "python",
+    "name": "cell13"
+   },
+   "outputs": [],
+   "source": [
+    "# Combine features and target for each split\n",
+    "train_df = pd.concat([X_train_scaled, y_train.reset_index(drop=True)], axis=1)\n",
+    "val_df = pd.concat([X_val_scaled, y_val.reset_index(drop=True)], axis=1)\n",
+    "\n",
+    "# Create DataConnectors\n",
+    "dataset_map = {\n",
+    "    \"train\": DataConnector.from_dataframe(session.create_dataframe(train_df)),\n",
+    "    \"val\": DataConnector.from_dataframe(session.create_dataframe(val_df)),\n",
+    "}\n",
+    "\n",
+    "print(\"Data connectors created successfully\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000013",
+   "metadata": {
+    "name": "cell14"
+   },
+   "source": [
+    "### Define Training Function with Experiment Tracking\n",
+    "\n",
+    "The training function will be executed for each trial. It integrates both HPO and Experiment Tracking.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce110000-1111-2222-3333-ffffff000014",
+   "metadata": {
+    "language": "python",
+    "name": "cell15"
+   },
+   "outputs": [],
+   "source": [
+    "def train_function():\n",
+    "    \"\"\"\n",
+    "    Training function executed for each HPO trial.\n",
+    "    Integrates with both TunerContext and ExperimentTracking.\n",
+    "    \"\"\"    \n",
+    "    trial_session = Session.builder.getOrCreate()\n",
+    "    \n",
+    "    # Get tuner context\n",
+    "    tuner_context = tune.get_tuner_context()\n",
+    "    params = tuner_context.get_hyper_params()\n",
+    "    dm = tuner_context.get_dataset_map()\n",
+    "    \n",
+    "    # Initialize experiment tracking for this trial\n",
+    "    exp = ExperimentTracking(session=trial_session)\n",
+    "    exp.set_experiment(experiment_name)\n",
+    "    with exp.start_run():\n",
+    "        # Log hyperparameters\n",
+    "        exp.log_params(params)\n",
+    "        \n",
+    "        # Load data\n",
+    "        train_data = dm[\"train\"].to_pandas()\n",
+    "        val_data = dm[\"val\"].to_pandas()\n",
+    "        \n",
+    "        # Separate features and target\n",
+    "        X_train = train_data.drop('QUALITY', axis=1)\n",
+    "        y_train = train_data['QUALITY']\n",
+    "        X_val = val_data.drop('QUALITY', axis=1)\n",
+    "        y_val = val_data['QUALITY']\n",
+    "        \n",
+    "        # Train model with hyperparameters from HPO\n",
+    "        model = XGBClassifier(**params)\n",
+    "        model.fit(X_train, y_train)\n",
+    "        \n",
+    "        # Evaluate on validation set\n",
+    "        y_val_pred = model.predict(X_val)\n",
+    "        y_val_proba = model.predict_proba(X_val)[:, 1]\n",
+    "        \n",
+    "        # Calculate validation metrics\n",
+    "        val_metrics = {\n",
+    "            'val_accuracy': metrics.accuracy_score(y_val, y_val_pred),\n",
+    "            'val_precision': metrics.precision_score(y_val, y_val_pred),\n",
+    "            'val_recall': metrics.recall_score(y_val, y_val_pred),\n",
+    "            'val_f1': metrics.f1_score(y_val, y_val_pred),\n",
+    "            'val_roc_auc': metrics.roc_auc_score(y_val, y_val_proba)\n",
+    "        }\n",
+    "      \n",
+    "        # Log metrics to experiment tracking\n",
+    "        exp.log_metrics(val_metrics)\n",
+    "        \n",
+    "        # Report to HPO framework (optimize on validation F1)\n",
+    "        tuner_context.report(metrics=val_metrics, model=model)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000015",
+   "metadata": {
+    "name": "cell16"
+   },
+   "source": [
+    "### Define Search Space\n",
+    "\n",
+    "We'll define the hyperparameter search space using Snowflake's sampling functions.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce110000-1111-2222-3333-ffffff000016",
+   "metadata": {
+    "language": "python",
+    "name": "cell17"
+   },
+   "outputs": [],
+   "source": [
+    "# Define search space for XGBoost\n",
+    "search_space = {\n",
+    "    'n_estimators': tune.randint(50, 300),\n",
+    "    'max_depth': tune.randint(3, 15),\n",
+    "    'learning_rate': tune.loguniform(0.01, 0.3),\n",
+    "    'subsample': tune.uniform(0.5, 1.0),\n",
+    "    'colsample_bytree': tune.uniform(0.5, 1.0),\n",
+    "    'gamma': tune.uniform(0.0, 0.5),\n",
+    "    'min_child_weight': tune.randint(1, 10),\n",
+    "    'random_state': 42,\n",
+    "}\n",
+    "\n",
+    "print(\"Search space defined:\")\n",
+    "for param, space in search_space.items():\n",
+    "    print(f\"  {param}: {space}\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000017",
+   "metadata": {
+    "collapsed": false,
+    "name": "cell18"
+   },
+   "source": [
+    "### Configure and Run HPO\n",
+    "\n",
+    "Configure the tuner to:\n",
+    "- Maximize F1 score\n",
+    "- Run 50 trials with random search\n",
+    "- Execute trials in parallel across available nodes\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd695b29-e15f-46ba-8388-b4f5932a84ad",
+   "metadata": {
+    "collapsed": false,
+    "name": "cell23"
+   },
+   "source": [
+    "#### Monitor Node Activity with the Ray Dashboard\n",
+    "Use the output url to access the dashboard"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4736f03a-e044-4133-8a6e-7d90066fb9ed",
+   "metadata": {
+    "language": "python",
+    "name": "cell22"
+   },
+   "outputs": [],
+   "source": [
+    "from snowflake.ml.runtime_cluster import get_ray_dashboard_url\n",
+    "get_ray_dashboard_url()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75ffd2fe-7fbe-4e9f-b595-7e5794c7d828",
+   "metadata": {
+    "collapsed": false,
+    "name": "cell24"
+   },
+   "source": [
+    "#### Run HPO"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce110000-1111-2222-3333-ffffff000018",
+   "metadata": {
+    "language": "python",
+    "name": "cell19"
+   },
+   "outputs": [],
+   "source": [
+    "# Scale cluster for distributed processing\n",
+    "print(\"Scaling cluster for distributed HPO...\")\n",
+    "scale_cluster(10)  # Scale up nodes\n",
+    "\n",
+    "# Configure tuner\n",
+    "tuner_config = tune.TunerConfig(\n",
+    "    metric='val_f1',\n",
+    "    mode='max',\n",
+    "    search_alg=RandomSearch(),\n",
+    "    num_trials=50\n",
+    ")\n",
+    "\n",
+    "# Create tuner\n",
+    "tuner = tune.Tuner(\n",
+    "    train_func=train_function,\n",
+    "    search_space=search_space,\n",
+    "    tuner_config=tuner_config\n",
+    ")\n",
+    "\n",
+    "print(\"Starting distributed hyperparameter optimization...\")\n",
+    "\n",
+    "# Run HPO\n",
+    "try:\n",
+    "    results = tuner.run(dataset_map=dataset_map)\n",
+    "    print(\"\\nHPO completed successfully\")\n",
+    "except Exception as e:\n",
+    "    print(f\"\\nError during HPO: {e}\")\n",
+    "    raise\n",
+    "finally:\n",
+    "    # Scale cluster back down\n",
+    "    scale_cluster(1)\n",
+    "    print(\"Cluster scaled back to 1 node\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000019",
+   "metadata": {
+    "collapsed": false,
+    "name": "cell20"
+   },
+   "source": [
+    "## Step 4: Analyze Results\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce110000-1111-2222-3333-ffffff000020",
+   "metadata": {
+    "language": "python",
+    "name": "cell21"
+   },
+   "outputs": [],
+   "source": [
+    "# Display all results\n",
+    "print(\"BEST MODEL FOUND\")\n",
+    "print(\"=\"*60)\n",
+    "\n",
+    "# Extract best hyperparameters\n",
+    "print(f\"\\nBest Parameters:\")\n",
+    "best_model = results.best_model\n",
+    "params = best_model.get_xgb_params()\n",
+    "print(params)\n",
+    "\n",
+    "# Compare with baseline\n",
+    "best_f1 = results.best_result['val_f1'][0]\n",
+    "baseline_f1 = val_metrics['val_f1']  # From baseline model\n",
+    "improvement = ((best_f1 - baseline_f1) / baseline_f1) * 100\n",
+    "\n",
+    "print(f\"\\nPerformance Comparison:\")\n",
+    "print(f\"  Baseline F1: {baseline_f1:.4f}\")\n",
+    "print(f\"  Best HPO F1: {best_f1:.4f}\")\n",
+    "print(f\"  Improvement: {improvement:+.2f}%\")\n",
+    "\n",
+    "# Get test set f1 score\n",
+    "y_test_pred = best_model.predict(X_test_scaled)\n",
+    "test_f1 = metrics.f1_score(y_test, y_test_pred)\n",
+    "print(f\"\\n\\n Best HPO Test Set F1: {test_f1:.4f}\")\n",
+    "\n",
+    "results.best_result"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000029",
+   "metadata": {
+    "collapsed": false,
+    "name": "cell30"
+   },
+   "source": [
+    "## Step 5: View Results in Snowflake UI\n",
+    "\n",
+    "All experiment runs are now available in the Snowflake UI:\n",
+    "\n",
+    "1. Navigate to **AI & ML > Experiments** in the left sidebar\n",
+    "2. Find the `Wine_Quality_Classification_YYYYMMDD` experiment (with today's date)\n",
+    "3. Compare runs, view metrics, and analyze results\n",
+    "\n",
+    "**Note**: Each time you run this notebook on a different day, it creates a new dated experiment, allowing you to track model performance over time and across different data versions.\n",
+    "\n",
+    "The UI provides:\n",
+    "- Side-by-side run comparisons\n",
+    "- Metric visualizations\n",
+    "- Parameter distributions\n",
+    "- Model artifacts and metadata\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000030",
+   "metadata": {
+    "collapsed": false,
+    "name": "cell31"
+   },
+   "source": [
+    "## Summary\n",
+    "\n",
+    "In this notebook, we demonstrated:\n",
+    "\n",
+    "1. **Experiment Tracking**: Logged parameters and metrics to Snowflake\n",
+    "2. **Distributed HPO**: Ran 50 trials in parallel across multiple nodes\n",
+    "3. **Integration**: Combined both capabilities for comprehensive ML experimentation\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce110000-1111-2222-3333-ffffff000031",
+   "metadata": {
+    "collapsed": false,
+    "name": "cell32"
+   },
+   "source": [
+    "## Next Steps\n",
+    "\n",
+    "### Extend this Example\n",
+    "\n",
+    "1. **Adjust the search space** - Modify hyperparameter ranges based on your problem domain and data size\n",
+    "2. **Increase trial count** - Scale to 100-200 trials for more thorough optimization\n",
+    "3. **Scale compute clusters** - Adjust `scale_cluster()` to increase or decrease parallelism\n",
+    "4. **Deploy the winning model** - Register to Snowflake Model Registry\n",
+    "\n",
+    "\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  },
+  "lastEditStatus": {
+   "authorEmail": "marie.coolsaet@snowflake.com",
+   "authorId": "317811122459",
+   "authorName": "ADMIN",
+   "lastEditTime": 1762192477049,
+   "notebookId": "6wwgc5yvkslbtwzqxiyl",
+   "sessionId": "ccfa6938-7d2b-4e2a-aee9-a515762cbb80"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}