Skip to content

Create notebook-cost estimator tool#12

Open
emmarogge wants to merge 3 commits intomainfrom
emmarogge/notebook_cost_estimator
Open

Create notebook-cost estimator tool#12
emmarogge wants to merge 3 commits intomainfrom
emmarogge/notebook_cost_estimator

Conversation

@emmarogge
Copy link
Contributor

Create a tool that enables users to estimate the cost of running their own Jupyter notebooks in their Workbench workspaces.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a Jupyter notebook tool intended to help users estimate the approximate GCP cost of running Jupyter notebooks inside Verily Workbench (compute + data/storage + optional workspace-resource discovery via wb CLI).

Changes:

  • Introduces a new cost_estimator.ipynb notebook with ipywidgets-driven inputs and cost calculations.
  • Adds logic to estimate storage/query costs for GCS and BigQuery resources.
  • Adds workspace resource discovery and sizing helpers via wb CLI commands.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +354 to +365
"import json\n",
"import pandas as pd\n",
"from IPython.display import display, HTML\n",
"\n",
"def run_wb_command(command):\n",
"    \"\"\"Run a wb CLI command and return the result\"\"\"\n",
"    result = subprocess.run(command, shell=True, capture_output=True, text=True)\n",
"    if result.returncode == 0:\n",
"        return result.stdout.strip()\n",
"    else:\n",
"        raise Exception(f\"Command failed: {command}\\nError: {result.stderr}\")\n",
"\n",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_wb_command() uses subprocess.run(..., shell=True) and later interpolates values into commands (e.g., resource names/datasets). This makes command injection possible if any interpolated value contains shell metacharacters. Prefer shell=False with an argv list and pass arguments as separate list items.

Suggested change
"import json\n",
"import pandas as pd\n",
"from IPython.display import display, HTML\n",
"\n",
"def run_wb_command(command):\n",
"    \"\"\"Run a wb CLI command and return the result\"\"\"\n",
"    result = subprocess.run(command, shell=True, capture_output=True, text=True)\n",
"    if result.returncode == 0:\n",
"        return result.stdout.strip()\n",
"    else:\n",
"        raise Exception(f\"Command failed: {command}\\nError: {result.stderr}\")\n",
"\n",
"import shlex\n",
"import json\n",
"import pandas as pd\n",
"from IPython.display import display, HTML\n",
"\n",
"def run_wb_command(command):\n",
" \"\"\"Run a wb CLI command and return the result.\n",
"\n",
" Accepts either a string command or an iterable of arguments.\n",
" \"\"\"\n",
" if isinstance(command, str):\n",
" cmd = shlex.split(command)\n",
" else:\n",
" cmd = list(command)\n",
"\n",
" result = subprocess.run(cmd, capture_output=True, text=True)\n",
" if result.returncode == 0:\n",
" return result.stdout.strip()\n",
" else:\n",
" raise Exception(f\"Command failed: {command}\\nError: {result.stderr}\")\n",
"\n",

Copilot uses AI. Check for mistakes.
Comment on lines +437 to +439
"        resource_type = resources_df.iloc[i-1]['Type']\n",
"        stewardship = resources_df.iloc[i-1]['Stewardship']\n",
"        print(f\"   {i}. {name} ({resource_type}, {stewardship})\")\n",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop reuses the name resource_type, which earlier refers to the ipywidgets dropdown. Shadowing it with a string from the dataframe can break subsequent cells that expect resource_type.value. Rename the loop variable (e.g., resource_type_str) to avoid clobbering the widget.

Suggested change
"        resource_type = resources_df.iloc[i-1]['Type']\n",
"        stewardship = resources_df.iloc[i-1]['Stewardship']\n",
"        print(f\"   {i}. {name} ({resource_type}, {stewardship})\")\n",
"        resource_type_str = resources_df.iloc[i-1]['Type']\n",
"        stewardship = resources_df.iloc[i-1]['Stewardship']\n",
"        print(f\"   {i}. {name} ({resource_type_str}, {stewardship})\")\n",

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +200
"    elif resource_type == 'mixed':\n",
"        # Combination of storage and BigQuery\n",
"        gcs_storage = storage_price_per_gb_hour * data_gb * runtime_hrs\n",
"        bq_queries = (processed_gb / 1000) * bq_query_price_per_tb * queries\n",
"        storage_cost = gcs_storage + bq_queries\n",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the mixed branch, query costs are added into storage_cost and query_cost remains 0, but downstream code expects query costs in query_cost (it prints them only if query_cost > 0). Return the mixed query component via query_cost and keep storage in storage_cost so totals/breakdowns are correct.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +54
"notebook_file = widgets.Text(\n",
"    value='',\n",
"    placeholder='Enter notebook filename (e.g., analysis.ipynb)',\n",
"    description='Notebook:',\n",
"    disabled=False\n",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook_file widget is collected from the user but never read anywhere in the notebook (it’s only added to the UI). Either use it (e.g., to validate the notebook exists / extract metadata) or remove it to avoid confusing users.

Copilot uses AI. Check for mistakes.
"            \n",
"    except Exception as e:\n",
"        print(f\"   ⚠️ Could not determine size: {str(e)}\")\n",
"    \n",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_estimated_cost is incremented here but is never initialized anywhere in the notebook, so this will raise NameError. Initialize it before the loop (and decide whether it’s meant to start at 0 or include previously computed totals).

Suggested change
"    \n",
"    \n",
"    # Initialize total_estimated_cost before accumulating compute costs\n",
"    total_estimated_cost = 0.0\n",

Copilot uses AI. Check for mistakes.
Comment on lines +557 to +560
"    print(f\"   💻 Compute cost ({runtime_hours.value}h): ${compute_cost:.2f}\")\n",
"    print(f\"   💾 Storage cost ({total_storage_gb:.1f}GB): ${total_estimated_cost - compute_cost - total_query_costs:.4f}\")\n",
"    if total_query_costs > 0:\n",
"        print(f\"   🔍 Query costs: ${total_query_costs:.4f}\")\n",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This print statement references total_storage_gb and total_query_costs, but neither variable is defined in the notebook scope at this point. Aggregate these totals explicitly (e.g., sum per-resource sizes/costs) before printing the final breakdown.

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +103
"special_resources = widgets.Text(\n",
"    value='',\n",
"    placeholder='e.g., GPU, highmem',\n",
"    description='Special Resources:',\n",
"    disabled=False\n",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The special_resources widget is never used in any estimation logic. If the estimator is meant to account for GPUs/highmem, wire this input into the compute pricing/model selection; otherwise remove it to avoid misleading output.

Copilot uses AI. Check for mistakes.
Comment on lines +139 to +142
"default_machine_type = 'n1-standard-4'\n",
"default_vcpu = 4\n",
"default_ram_gb = 15\n",
"compute_price_per_hour = 0.158  # USD/hr (update if pricing changes)\n",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code contains non-ASCII whitespace characters (e.g., NBSP) — visible here after the numeric literal — which can cause Python SyntaxError: invalid non-printable character or indentation errors when executed. Please replace these with normal spaces (U+0020) throughout the notebook and re-save.

Copilot uses AI. Check for mistakes.
Comment on lines +186 to +190
"    if resource_type in ['gcs_bucket', 'gcs_object', 'mixed']:\n",
"        # Cloud Storage costs for data + output\n",
"        total_storage_gb = data_gb + output_gb\n",
"        storage_cost = storage_price_per_gb_hour * total_storage_gb * runtime_hrs\n",
"        \n",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixed is included in this Cloud Storage branch, so a later elif resource_type == 'mixed' branch (below) will never run. Handle mixed in its own branch (or remove the dead branch) so mixed-resource estimates are computed as intended.

Copilot uses AI. Check for mistakes.
Comment on lines +312 to +315
"- Resource type: {resource_explanation.get(resource_type.value, resource_type.value)}\n",
"- Storage price: ${storage_price_per_gb_month}/GB/month (Cloud Storage Standard)\"\"\"\n",
"\n",
"if resource_type.value in ['bq_dataset', 'bq_table']:\n",
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated explanation always includes the Cloud Storage Standard price, even when the selected resource type is BigQuery. This makes the displayed “Storage price” misleading for bq_dataset/bq_table; adjust the text to show BigQuery storage pricing when applicable (and/or omit GCS pricing for BQ-only flows).

Suggested change
"- Resource type: {resource_explanation.get(resource_type.value, resource_type.value)}\n",
"- Storage price: ${storage_price_per_gb_month}/GB/month (Cloud Storage Standard)\"\"\"\n",
"\n",
"if resource_type.value in ['bq_dataset', 'bq_table']:\n",
"- Resource type: {resource_explanation.get(resource_type.value, resource_type.value)}\"\"\"\n",
"\n",
"if resource_type.value in ['gcs_bucket', 'gcs_object', 'mixed']:\n",
"    explanation += f\"\"\"\n",
"- Storage price: ${storage_price_per_gb_month}/GB/month (Cloud Storage Standard)\"\"\"\n",
"\n",
"if resource_type.value in ['bq_dataset', 'bq_table', 'mixed']:\n",

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants