Create notebook-cost estimator tool#12

Open

emmarogge wants to merge 3 commits intomainfrom

emmarogge/notebook_cost_estimator

Contributor

emmarogge commented Nov 13, 2025

Create a tool that enables users to estimate the cost of running their own Jupyter notebooks in their Workbench workspaces.

emmarogge added 3 commits

November 13, 2025 17:20


          Create tool to enable users to estimate costs of running their notebo…

1d8c1d1

…oks in Workbench


          Update notebook to use wb CLI

b7dad76


          Fix formatting

ca1ddbf

emmarogge requested a review from Copilot

February 12, 2026 23:07

Copilot started reviewing on behalf of emmarogge

February 12, 2026 23:07

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull request overview

This PR adds a Jupyter notebook tool intended to help users estimate the approximate GCP cost of running Jupyter notebooks inside Verily Workbench (compute + data/storage + optional workspace-resource discovery via wb CLI).

Changes:

Introduces a new cost_estimator.ipynb notebook with ipywidgets-driven inputs and cost calculations.
Adds logic to estimate storage/query costs for GCS and BigQuery resources.
Adds workspace resource discovery and sizing helpers via wb CLI commands.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

notebook_cost_estimator/cost_estimator.ipynb

Comment on lines +354 to +365

+                  "import json\n",
+                  "import pandas as pd\n",
+                  "from IPython.display import display, HTML\n",
+                  "\n",
+                  "def run_wb_command(command):\n",
+                  "    \"\"\"Run a wb CLI command and return the result\"\"\"\n",
+                  "    result = subprocess.run(command, shell=True, capture_output=True, text=True)\n",
+                  "    if result.returncode == 0:\n",
+                  "        return result.stdout.strip()\n",
+                  "    else:\n",
+                  "        raise Exception(f\"Command failed: {command}\\nError: {result.stderr}\")\n",
+                  "\n",

Copilot AI Feb 12, 2026

run_wb_command() uses subprocess.run(..., shell=True) and later interpolates values into commands (e.g., resource names/datasets). This makes command injection possible if any interpolated value contains shell metacharacters. Prefer shell=False with an argv list and pass arguments as separate list items.

Suggested change

      
                "import json\n",
          
                "import pandas as pd\n",
          
                "from IPython.display import display, HTML\n",
          
                "\n",
          
                "def run_wb_command(command):\n",
          
                "    \"\"\"Run a wb CLI command and return the result\"\"\"\n",
          
                "    result = subprocess.run(command, shell=True, capture_output=True, text=True)\n",
          
                "    if result.returncode == 0:\n",
          
                "        return result.stdout.strip()\n",
          
                "    else:\n",
          
                "        raise Exception(f\"Command failed: {command}\\nError: {result.stderr}\")\n",
          
                "\n",
          
                "import shlex\n",
          
                "import json\n",
          
                "import pandas as pd\n",
          
                "from IPython.display import display, HTML\n",
          
                "\n",
          
                "def run_wb_command(command):\n",
          
                "    \"\"\"Run a wb CLI command and return the result.\n",
          
                "\n",
          
                "    Accepts either a string command or an iterable of arguments.\n",
          
                "    \"\"\"\n",
          
                "    if isinstance(command, str):\n",
          
                "        cmd = shlex.split(command)\n",
          
                "    else:\n",
          
                "        cmd = list(command)\n",
          
                "\n",
          
                "    result = subprocess.run(cmd, capture_output=True, text=True)\n",
          
                "    if result.returncode == 0:\n",
          
                "        return result.stdout.strip()\n",
          
                "    else:\n",
          
                "        raise Exception(f\"Command failed: {command}\\nError: {result.stderr}\")\n",
          
                "\n",

Copilot uses AI. Check for mistakes.

notebook_cost_estimator/cost_estimator.ipynb

Comment on lines +437 to +439

+                  "        resource_type = resources_df.iloc[i-1]['Type']\n",
+                  "        stewardship = resources_df.iloc[i-1]['Stewardship']\n",
+                  "        print(f\"   {i}. {name} ({resource_type}, {stewardship})\")\n",

Copilot AI Feb 12, 2026

This loop reuses the name resource_type, which earlier refers to the ipywidgets dropdown. Shadowing it with a string from the dataframe can break subsequent cells that expect resource_type.value. Rename the loop variable (e.g., resource_type_str) to avoid clobbering the widget.

Suggested change

      
                "        resource_type = resources_df.iloc[i-1]['Type']\n",
          
                "        stewardship = resources_df.iloc[i-1]['Stewardship']\n",
          
                "        print(f\"   {i}. {name} ({resource_type}, {stewardship})\")\n",
          
                "        resource_type_str = resources_df.iloc[i-1]['Type']\n",
          
                "        stewardship = resources_df.iloc[i-1]['Stewardship']\n",
          
                "        print(f\"   {i}. {name} ({resource_type_str}, {stewardship})\")\n",

Copilot uses AI. Check for mistakes.

notebook_cost_estimator/cost_estimator.ipynb

Comment on lines +196 to +200

+                  "    elif resource_type == 'mixed':\n",
+                  "        # Combination of storage and BigQuery\n",
+                  "        gcs_storage = storage_price_per_gb_hour * data_gb * runtime_hrs\n",
+                  "        bq_queries = (processed_gb / 1000) * bq_query_price_per_tb * queries\n",
+                  "        storage_cost = gcs_storage + bq_queries\n",

Copilot AI Feb 12, 2026

In the mixed branch, query costs are added into storage_cost and query_cost remains 0, but downstream code expects query costs in query_cost (it prints them only if query_cost > 0). Return the mixed query component via query_cost and keep storage in storage_cost so totals/breakdowns are correct.

Copilot uses AI. Check for mistakes.

notebook_cost_estimator/cost_estimator.ipynb

Comment on lines +50 to +54

+                  "notebook_file = widgets.Text(\n",
+                  "    value='',\n",
+                  "    placeholder='Enter notebook filename (e.g., analysis.ipynb)',\n",
+                  "    description='Notebook:',\n",
+                  "    disabled=False\n",

Copilot AI Feb 12, 2026

The notebook_file widget is collected from the user but never read anywhere in the notebook (it’s only added to the UI). Either use it (e.g., to validate the notebook exists / extract metadata) or remove it to avoid confusing users.

Copilot uses AI. Check for mistakes.

notebook_cost_estimator/cost_estimator.ipynb

+                  "            \n",
+                  "    except Exception as e:\n",
+                  "        print(f\"   ⚠️ Could not determine size: {str(e)}\")\n",
+                  "    \n",

Copilot AI Feb 12, 2026

total_estimated_cost is incremented here but is never initialized anywhere in the notebook, so this will raise NameError. Initialize it before the loop (and decide whether it’s meant to start at 0 or include previously computed totals).

Suggested change

      
                "    \n",
          
                "    \n",
          
                "    # Initialize total_estimated_cost before accumulating compute costs\n",
          
                "    total_estimated_cost = 0.0\n",

Copilot uses AI. Check for mistakes.

notebook_cost_estimator/cost_estimator.ipynb

Comment on lines +557 to +560

+                  "    print(f\"   💻 Compute cost ({runtime_hours.value}h): ${compute_cost:.2f}\")\n",
+                  "    print(f\"   💾 Storage cost ({total_storage_gb:.1f}GB): ${total_estimated_cost - compute_cost - total_query_costs:.4f}\")\n",
+                  "    if total_query_costs > 0:\n",
+                  "        print(f\"   🔍 Query costs: ${total_query_costs:.4f}\")\n",

Copilot AI Feb 12, 2026

This print statement references total_storage_gb and total_query_costs, but neither variable is defined in the notebook scope at this point. Aggregate these totals explicitly (e.g., sum per-resource sizes/costs) before printing the final breakdown.

Copilot uses AI. Check for mistakes.

notebook_cost_estimator/cost_estimator.ipynb

Comment on lines +99 to +103

+                  "special_resources = widgets.Text(\n",
+                  "    value='',\n",
+                  "    placeholder='e.g., GPU, highmem',\n",
+                  "    description='Special Resources:',\n",
+                  "    disabled=False\n",

Copilot AI Feb 12, 2026

The special_resources widget is never used in any estimation logic. If the estimator is meant to account for GPUs/highmem, wire this input into the compute pricing/model selection; otherwise remove it to avoid misleading output.

Copilot uses AI. Check for mistakes.

notebook_cost_estimator/cost_estimator.ipynb

Comment on lines +139 to +142

+                  "default_machine_type = 'n1-standard-4'\n",
+                  "default_vcpu = 4\n",
+                  "default_ram_gb = 15\n",
+                  "compute_price_per_hour = 0.158  # USD/hr (update if pricing changes)\n",

Copilot AI Feb 12, 2026

This code contains non-ASCII whitespace characters (e.g., NBSP) — visible here after the numeric literal — which can cause Python SyntaxError: invalid non-printable character or indentation errors when executed. Please replace these with normal spaces (U+0020) throughout the notebook and re-save.

Copilot uses AI. Check for mistakes.

notebook_cost_estimator/cost_estimator.ipynb

Comment on lines +186 to +190

+                  "    if resource_type in ['gcs_bucket', 'gcs_object', 'mixed']:\n",
+                  "        # Cloud Storage costs for data + output\n",
+                  "        total_storage_gb = data_gb + output_gb\n",
+                  "        storage_cost = storage_price_per_gb_hour * total_storage_gb * runtime_hrs\n",
+                  "        \n",

Copilot AI Feb 12, 2026

mixed is included in this Cloud Storage branch, so a later elif resource_type == 'mixed' branch (below) will never run. Handle mixed in its own branch (or remove the dead branch) so mixed-resource estimates are computed as intended.

Copilot uses AI. Check for mistakes.

notebook_cost_estimator/cost_estimator.ipynb

Comment on lines +312 to +315

+                  "- Resource type: {resource_explanation.get(resource_type.value, resource_type.value)}\n",
+                  "- Storage price: ${storage_price_per_gb_month}/GB/month (Cloud Storage Standard)\"\"\"\n",
+                  "\n",
+                  "if resource_type.value in ['bq_dataset', 'bq_table']:\n",

Copilot AI Feb 12, 2026

The generated explanation always includes the Cloud Storage Standard price, even when the selected resource type is BigQuery. This makes the displayed “Storage price” misleading for bq_dataset/bq_table; adjust the text to show BigQuery storage pricing when applicable (and/or omit GCS pricing for BQ-only flows).

Suggested change

      
                "- Resource type: {resource_explanation.get(resource_type.value, resource_type.value)}\n",
          
                "- Storage price: ${storage_price_per_gb_month}/GB/month (Cloud Storage Standard)\"\"\"\n",
          
                "\n",
          
                "if resource_type.value in ['bq_dataset', 'bq_table']:\n",
          
                "- Resource type: {resource_explanation.get(resource_type.value, resource_type.value)}\"\"\"\n",
          
                "\n",
          
                "if resource_type.value in ['gcs_bucket', 'gcs_object', 'mixed']:\n",
          
                "    explanation += f\"\"\"\n",
          
                "- Storage price: ${storage_price_per_gb_month}/GB/month (Cloud Storage Standard)\"\"\"\n",
          
                "\n",
          
                "if resource_type.value in ['bq_dataset', 'bq_table', 'mixed']:\n",

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet