Add Deployment guide #2

echo-chamber0 · 2026-01-15T16:21:40Z

Add deployment guide how to use GCP marketplace to deploy custom dc into GKE

gemini-code-assist · 2026-01-15T16:21:52Z

Summary of Changes

Hello @echo-chamber0, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, extensive deployment guide for the Data Commons Accelerator, specifically tailored for deployment through the Google Cloud Marketplace into a Google Kubernetes Engine (GKE) cluster. The guide aims to provide users with clear, step-by-step instructions and best practices for setting up, configuring, and managing their Data Commons instance, ensuring a smooth and secure deployment experience from initial setup to post-deployment operations and cleanup.

Highlights

New Deployment Guide Added: A comprehensive guide for deploying the Data Commons Accelerator through the Google Cloud Marketplace into a GKE cluster has been added.
Detailed Prerequisites: The guide outlines all necessary prerequisites, including GKE cluster version requirements, Workload Identity enablement, Private Service Access configuration, and required IAM permissions.
Step-by-Step Deployment: It provides a field-by-field walkthrough of the GCP Marketplace deployment form, covering basic configuration, GKE cluster settings, CloudSQL, Cloud Storage, API, and application configurations.
Post-Deployment Management: Sections on accessing the deployed application, verifying its status, key features, troubleshooting common issues, and a detailed guide for deleting the deployment and associated resources are included.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a comprehensive deployment guide for using the Data Commons Accelerator on GCP Marketplace. The guide is well-structured and detailed, covering the entire process from prerequisites to deletion. I've provided some feedback to improve clarity, correctness, and adherence to best practices. My comments include fixing a broken link in the table of contents, correcting potentially misleading information about data loss and downtime, improving the robustness of gcloud commands, and fixing a minor formatting issue in a markdown table. Overall, this is an excellent addition to the project's documentation.

gemini-code-assist · 2026-01-15T16:23:32Z

docs/DEPLOYMENT_GUIDE.md

+- **Field name**: `cloudsql_ha_enabled`
+- **What it means**: Enables automatic database replication to a different availability zone
+- **How it works**:
+  - **Disabled (default)**: Single database instance in one zone. If zone fails, data is lost


The statement "If zone fails, data is lost" for a non-HA CloudSQL instance is misleading. Standard Cloud SQL instances have point-in-time recovery and automated backups enabled by default. In a zonal failure, the instance can be restored from a backup in another zone. While this involves downtime and potential loss of data since the last backup, the data is not permanently lost. Please clarify this to provide a more accurate picture of the failure scenario.

Suggested change

- **Disabled (default)**: Single database instance in one zone. If zone fails, data is lost

- **Disabled (default)**: Single database instance in one zone. If a zone fails, the instance is unavailable and must be restored from backup, which can cause downtime and data loss.

gemini-code-assist · 2026-01-15T16:23:32Z

docs/DEPLOYMENT_GUIDE.md

+4. [Deployment via GCP Marketplace](#deployment-via-gcp-marketplace)
+5. [Accessing Your Deployment](#accessing-your-deployment)
+6. [Using Data Commons Accelerator](#using-data-commons-accelerator)
+7. [Managing Your Deployment](#managing-your-deployment)


The Table of Contents includes an entry for "Managing Your Deployment" that links to #managing-your-deployment. However, there is no corresponding section header in the document, which will result in a broken link. Please either add this section or remove the entry from the table of contents.

gemini-code-assist · 2026-01-15T16:23:32Z

docs/DEPLOYMENT_GUIDE.md

+gcloud compute addresses list --global \
+  --filter="purpose=VPC_PEERING AND network~YOUR_VPC_NAME" \
+  --format="table(name,address,prefixLength,network)"


The gcloud command uses the ~ operator to filter by network name. This operator performs a regular expression match. For matching a specific VPC name, it is safer and clearer to use the exact match operator =. This prevents unexpected behavior if the VPC name contains characters with special meaning in regular expressions.

Suggested change

gcloud compute addresses list --global \

--filter="purpose=VPC_PEERING AND network~YOUR_VPC_NAME" \

--format="table(name,address,prefixLength,network)"

gcloud compute addresses list --global \

--filter="purpose=VPC_PEERING AND network=YOUR_VPC_NAME" \

--format="table(name,address,prefixLength,network)"

this suggestion sounds reasonable

gemini-code-assist · 2026-01-15T16:23:32Z

docs/DEPLOYMENT_GUIDE.md

+- **What it means**: Enables automatic database replication to a different availability zone
+- **How it works**:
+  - **Disabled (default)**: Single database instance in one zone. If zone fails, data is lost
+  - **Enabled**: Two instances (primary + replica) in different zones. If one zone fails, automatically switches to replica with zero downtime


The claim of "zero downtime" for High Availability failover is an overstatement. While Cloud SQL HA failover is very fast, it is not instantaneous. There is typically a brief period of unavailability, usually lasting a few seconds to a minute. It's more accurate to use terms like "minimal downtime" or "near-zero downtime" to set correct expectations.

Suggested change

- **Enabled**: Two instances (primary + replica) in different zones. If one zone fails, automatically switches to replica with zero downtime

- **Enabled**: Two instances (primary + replica) in different zones. If one zone fails, it automatically switches to the replica with minimal downtime.

gemini-code-assist · 2026-01-15T16:23:32Z

docs/DEPLOYMENT_GUIDE.md

+
+- **Field name**: `gcs_bucket_name`
+- **What it means**: Name of the Cloud Storage bucket (like a drive on the cloud)
+- **Format**: Must be globally unique across all GCP projects. Only lowercase letters, numbers, hyphens, periods


The guide mentions that periods are allowed in GCS bucket names. While true, it's a GCP best practice to avoid them as they can cause issues with virtual-hosted-style URLs and SSL certificate validation. It would be beneficial to add a note of caution.

Suggested change

- **Format**: Must be globally unique across all GCP projects. Only lowercase letters, numbers, hyphens, periods

- **Format**: Must be globally unique across all GCP projects. Only lowercase letters, numbers, hyphens, periods (use of periods is not recommended).

gemini-code-assist · 2026-01-15T16:23:32Z

docs/DEPLOYMENT_GUIDE.md

+
+| Tier | CPU | Memory | Best For |
+|------|-----|--------|--------|
+| `small` | 1.0 | 2 GB| Light workloads, <10 concurrent users |


gemini-code-assist · 2026-01-15T16:23:32Z

docs/DEPLOYMENT_GUIDE.md

+
+# Keep this terminal open; the port-forward will run in the foreground
+# In another terminal, test the application:
+curl http://localhost:8080


For verifying the application is running, it's more reliable to query a dedicated health check endpoint like /healthz (which is used later in the troubleshooting section) instead of the root URL. A health check endpoint provides a more accurate status of the application's health.

Suggested change

curl http://localhost:8080

curl http://localhost:8080/healthz

gemini-code-assist · 2026-01-15T16:23:32Z

docs/DEPLOYMENT_GUIDE.md

+
+**Solution:**
+1. Verify your GKE cluster's VPC: `gcloud container clusters describe CLUSTER --format="value(network)"`
+2. Find PSA ranges in that VPC: `gcloud compute addresses list --global --filter="purpose=VPC_PEERING AND network~YOUR_VPC"`


The gcloud command uses the ~ operator to filter by network name. This operator performs a regular expression match. For matching a specific VPC name, it is safer and clearer to use the exact match operator =. This prevents unexpected behavior if the VPC name contains characters with special meaning in regular expressions.

Suggested change

2. Find PSA ranges in that VPC: `gcloud compute addresses list --global --filter="purpose=VPC_PEERING AND network~YOUR_VPC"`

2. Find PSA ranges in that VPC: `gcloud compute addresses list --global --filter="purpose=VPC_PEERING AND network=YOUR_VPC"`

dwnoble · 2026-01-21T22:43:57Z

docs/DEPLOYMENT_GUIDE.md

+
+### What is Data Commons Accelerator?
+
+Data Commons Accelerator is a ready-to-deploy instance of the Data Commons platform on Google Kubernetes Engine (GKE). Data Commons is an open knowledge repository providing unified access to public datasets and statistics, enabling your organization to explore data without manually aggregating from multiple sources.


Suggested change

Data Commons Accelerator is a ready-to-deploy instance of the Data Commons platform on Google Kubernetes Engine (GKE). Data Commons is an open knowledge repository providing unified access to public datasets and statistics, enabling your organization to explore data without manually aggregating from multiple sources.

Data Commons Accelerator is a ready-to-deploy instance of [Custom Data Commons](https://docs.datacommons.org/custom_dc/) on Google Kubernetes Engine (GKE). [Data Commons](https://datacommons.org) is an open knowledge repository providing unified access to public datasets and statistics, enabling your organization to explore data without manually aggregating from multiple sources.

dwnoble · 2026-01-21T22:45:20Z

docs/DEPLOYMENT_GUIDE.md

+
+### What Problems Does It Solve?
+
+Data Commons Accelerator addresses these common data exploration challenges:


Suggested change

Data Commons Accelerator addresses these common data exploration challenges:

Data Commons addresses these common data exploration challenges:

dwnoble · 2026-01-21T22:46:48Z

docs/DEPLOYMENT_GUIDE.md

+
+- **Data Commons Accelerator Web Application**: Interactive interface for data exploration and visualization
+- **CloudSQL MySQL Database**: Persistent storage for datasets and metadata (with optional high availability)
+- **Cloud Storage Bucket**: Scalable storage for custom data imports and exports


Suggested change

- **Cloud Storage Bucket**: Scalable storage for custom data imports and exports

- **Cloud Storage Bucket**: Scalable storage for custom data imports

dwnoble · 2026-01-21T22:47:51Z

docs/DEPLOYMENT_GUIDE.md

+- **Data Commons Accelerator Web Application**: Interactive interface for data exploration and visualization
+- **CloudSQL MySQL Database**: Persistent storage for datasets and metadata (with optional high availability)
+- **Cloud Storage Bucket**: Scalable storage for custom data imports and exports
+- **Kubernetes Workload**: Application deployed to your existing GKE cluster with Workload Identity authentication


do we create a new gke cluster or reuse an existing clsuter?

dwnoble · 2026-01-21T22:50:04Z

docs/DEPLOYMENT_GUIDE.md

+
+### Components
+
+The Data Commons Accelerator solution consists of four primary components:


Also:

GKE data ingestion container

redis cache (memorystore?)

dwnoble · 2026-01-21T23:03:04Z

docs/DEPLOYMENT_GUIDE.md

+
+## Using Data Commons Accelerator
+
+### Key Features


I'd remove this section and just point to the doc site

dwnoble · 2026-01-21T23:03:22Z

docs/DEPLOYMENT_GUIDE.md

+
+To get started using Data Commons:
+
+- **Official Tutorials**: https://datacommons.org/tutorials


this link looks broken

dwnoble · 2026-01-21T23:04:01Z

docs/DEPLOYMENT_GUIDE.md

+
+- **Official Tutorials**: https://datacommons.org/tutorials
+- **API Documentation**: https://docs.datacommons.org/api
+- **Knowledge Graph Explorer**: https://datacommons.org/ (official site for learning about available data)


Suggested change

- **Knowledge Graph Explorer**: https://datacommons.org/ (official site for learning about available data)

- **Knowledge Graph Explorer**: https://datacommons.org/browser (official site for learning about available data)

dwnoble · 2026-01-21T23:06:08Z

docs/DEPLOYMENT_GUIDE.md

+
+**Troubleshooting Steps:**
+
+**1. Verify Application Pods Are Running**


these steps look the same as the verify deployment section. is there a good way to consolidate them?

dwnoble · 2026-01-21T23:06:56Z

docs/DEPLOYMENT_GUIDE.md

+
+These resources are automatically deleted:
+
+- Kubernetes namespace and all pods


memorystore instance

echo-chamber0 added 2 commits January 15, 2026 17:19

Add Deployment guide

58a63ae

Add Deployment guide

6cfa8ad

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

dwnoble self-requested a review January 15, 2026 17:54

echo-chamber0 added 2 commits January 15, 2026 21:16

Fix spaces

28feddb

Fix broken link

1e7819a

dwnoble reviewed Jan 21, 2026

View reviewed changes

	- Disabled (default): Single database instance in one zone. If zone fails, data is lost
	- Disabled (default): Single database instance in one zone. If a zone fails, the instance is unavailable and must be restored from backup, which can cause downtime and data loss.

	- Enabled: Two instances (primary + replica) in different zones. If one zone fails, automatically switches to replica with zero downtime
	- Enabled: Two instances (primary + replica) in different zones. If one zone fails, it automatically switches to the replica with minimal downtime.

	- Format: Must be globally unique across all GCP projects. Only lowercase letters, numbers, hyphens, periods
	- Format: Must be globally unique across all GCP projects. Only lowercase letters, numbers, hyphens, periods (use of periods is not recommended).

	\| `small` \| 1.0 \| 2 GB\| Light workloads, <10 concurrent users \|
	\| `small` \| 1.0 \| 2 GB \| Light workloads, <10 concurrent users \|

	curl http://localhost:8080
	curl http://localhost:8080/healthz

	2. Find PSA ranges in that VPC: `gcloud compute addresses list --global --filter="purpose=VPC_PEERING AND network~YOUR_VPC"`
	2. Find PSA ranges in that VPC: `gcloud compute addresses list --global --filter="purpose=VPC_PEERING AND network=YOUR_VPC"`


		### What is Data Commons Accelerator?

		Data Commons Accelerator is a ready-to-deploy instance of the Data Commons platform on Google Kubernetes Engine (GKE). Data Commons is an open knowledge repository providing unified access to public datasets and statistics, enabling your organization to explore data without manually aggregating from multiple sources.


		### What Problems Does It Solve?

		Data Commons Accelerator addresses these common data exploration challenges:

	Data Commons Accelerator addresses these common data exploration challenges:
	Data Commons addresses these common data exploration challenges:

	- Cloud Storage Bucket: Scalable storage for custom data imports and exports
	- Cloud Storage Bucket: Scalable storage for custom data imports


		### Components

		The Data Commons Accelerator solution consists of four primary components:


		To get started using Data Commons:

		- Official Tutorials: https://datacommons.org/tutorials

	- Knowledge Graph Explorer: https://datacommons.org/ (official site for learning about available data)
	- Knowledge Graph Explorer: https://datacommons.org/browser (official site for learning about available data)


		Troubleshooting Steps:

		1. Verify Application Pods Are Running


		These resources are automatically deleted:

		- Kubernetes namespace and all pods

Add Deployment guide #2

Are you sure you want to change the base?

Add Deployment guide #2

Uh oh!

Conversation

echo-chamber0 commented Jan 15, 2026

Uh oh!

gemini-code-assist bot commented Jan 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants