Skip to main content
Version: 0.2.0

Terraform — GCP

Version: 0.1.0

Provision a production-ready GKE cluster for DecisionBox using the included Terraform module.

What It Creates

ResourceDescription
VPCDedicated network with subnets for nodes, pods, and services
Cloud NATOutbound internet access for private nodes
Firewall rulesInternal traffic + GCP health check ranges
GKE clusterPrivate nodes, Dataplane V2, auto-upgrade, shielded nodes
Node poolAuto-scaling with configurable machine type and disk
Service accountsNode SA (logging/monitoring) + Workload Identity SA (API)
IAM bindingsWorkload Identity, Secret Manager access, BigQuery access (optional)

Prerequisites

  • Terraform 1.5+
  • gcloud CLI authenticated with a project
  • GCP project with billing enabled
  • Sufficient IAM permissions (Project Owner or Editor)

Quick Start with Setup Wizard

The included setup wizard handles Terraform state, cluster provisioning, and Helm deployment in one flow:

cd terraform
./setup.sh # Full interactive setup
./setup.sh --dry-run # Generate config files only
./setup.sh --resume # Resume from Helm deploy

The wizard prompts for:

  1. Cloud provider (GCP)
  2. Secret namespace prefix
  3. Secret provider (GCP Secret Manager or MongoDB)
  4. GCP project ID, region, cluster name
  5. Terraform state bucket (auto-creates if needed)
  6. Machine type and node scaling
  7. BigQuery IAM (optional)
  8. SECRET_ENCRYPTION_KEY (auto-generates or user-provided)

After provisioning, it automatically:

  • Configures kubectl credentials
  • Creates the Kubernetes namespace and secrets
  • Deploys API and Dashboard via Helm
  • Waits for ingress and verifies health checks
  • Displays the dashboard URL

Manual Deployment

Step 1: Create a Terraform State Bucket

PROJECT_ID=$(gcloud config get-value project)
gsutil mb -p $PROJECT_ID gs://$PROJECT_ID-terraform-state
gsutil versioning set on gs://$PROJECT_ID-terraform-state

Step 2: Configure Variables

Create terraform/gcp/prod/terraform.tfvars:

project_id   = "my-gcp-project"
region = "us-central1"
cluster_name = "decisionbox-prod"

# Networking
create_vpc = true
subnet_cidr = "10.0.0.0/20"
pods_cidr = "10.4.0.0/14"
services_cidr = "10.8.0.0/20"

# Node pool
machine_type = "e2-standard-2"
min_node_count = 1
max_node_count = 2
disk_size_gb = 50

# Workload Identity
k8s_namespace = "decisionbox"
k8s_service_account = "decisionbox-api"

# Optional: GCP Secret Manager
enable_gcp_secrets = true
secret_namespace = "decisionbox"

# Optional: BigQuery read access
enable_bigquery_iam = true

# Optional: Vertex AI access (for Claude via Vertex or Gemini)
enable_vertex_ai_iam = true

Step 3: Initialize and Apply

cd terraform/gcp/prod

terraform init \
-backend-config="bucket=$PROJECT_ID-terraform-state" \
-backend-config="prefix=prod"

terraform plan -out=tfplan
terraform apply tfplan

Step 4: Configure kubectl

gcloud container clusters get-credentials decisionbox-prod \
--region us-central1 \
--project $PROJECT_ID

Step 5: Deploy with Helm

Follow the Kubernetes Deployment guide to deploy the API and Dashboard.

When using GCP Secret Manager with Workload Identity, annotate the service account:

# values-prod.yaml
serviceAccountAnnotations:
iam.gke.io/gcp-service-account: "decisionbox-prod-api@my-gcp-project.iam.gserviceaccount.com"

Module Architecture

terraform/gcp/
├── prod/
│ ├── versions.tf # Provider versions (Google 5.0-7.0)
│ ├── variables.tf # Environment-level variables
│ ├── main.tf # Module instantiation
│ └── outputs.tf # Cluster outputs
└── modules/decisionbox/
├── apis.tf # GCP API enablement
├── networking.tf # VPC, subnets, NAT, firewalls
├── gke.tf # GKE cluster + node pool
├── iam.tf # Service accounts + Workload Identity
├── secrets.tf # Secret Manager IAM (conditional)
├── bigquery.tf # BigQuery IAM (conditional)
├── variables.tf # 40+ input variables
└── outputs.tf # Cluster outputs

Variables Reference

All variables are defined in terraform/gcp/modules/decisionbox/variables.tf.

Required

VariableTypeDescription
project_idstringGCP project ID

Cluster

VariableTypeDefaultDescription
regionstringus-central1GCP region
cluster_namestringdecisionbox-prodGKE cluster name
create_clusterbooltrueCreate GKE cluster (false to use existing)
deletion_protectionbooltruePrevent accidental cluster deletion
release_channelstringREGULARGKE release channel
datapath_providerstringADVANCED_DATAPATHDataplane V2 for network policy
enable_network_policybooltrueEnable network policy enforcement
network_policy_providerstringCALICONetwork policy provider (used when not ADVANCED_DATAPATH)
enable_binary_authorizationboolfalseBinary Authorization for container images
logging_componentslist(string)["SYSTEM_COMPONENTS", "WORKLOADS"]GKE logging components
monitoring_componentslist(string)["SYSTEM_COMPONENTS"]GKE monitoring components

Networking

VariableTypeDefaultDescription
create_vpcbooltrueCreate VPC (false to use existing)
existing_vpc_idstring""Existing VPC self-link (when create_vpc=false)
existing_subnet_idstring""Existing subnet self-link
subnet_cidrstring10.0.0.0/20Node subnet CIDR
pods_cidrstring10.4.0.0/14Pod IP range
pods_range_namestringpodsSecondary range name for pods
services_cidrstring10.8.0.0/20Service IP range
services_range_namestringservicesSecondary range name for services
master_cidrstring172.16.0.0/28Control plane CIDR
enable_private_nodesbooltrueNodes have no public IPs
enable_private_endpointboolfalseRestrict master to private network
master_authorized_networkslist(object)[{cidr_block="0.0.0.0/0", display_name="all"}]CIDRs allowed to reach the master API
enable_flow_logsbooltrueVPC flow logs
flow_log_intervalstringINTERVAL_10_MINFlow log aggregation interval
flow_log_samplingnumber0.5Flow log sampling rate (0.0-1.0)
flow_log_metadatastringINCLUDE_ALL_METADATAFlow log metadata inclusion

NAT

VariableTypeDefaultDescription
nat_ip_allocate_optionstringAUTO_ONLYNAT IP allocation
nat_source_subnetwork_ip_rangesstringALL_SUBNETWORKS_ALL_IP_RANGESNAT source ranges
enable_nat_loggingbooltrueCloud NAT logging
nat_log_filterstringERRORS_ONLYNAT log filter

Firewall

VariableTypeDefaultDescription
internal_tcp_portslist(string)["0-65535"]Internal TCP ports allowed
internal_udp_portslist(string)["0-65535"]Internal UDP ports allowed
health_check_portslist(string)["80","443","3000","8080","10256"]Health check ports
health_check_source_rangeslist(string)["35.191.0.0/16","130.211.0.0/22"]GCP health check IP ranges

Node Pool

VariableTypeDefaultDescription
machine_typestringe2-standard-2GCE machine type
disk_size_gbnumber50Boot disk size (GB)
disk_typestringpd-standardBoot disk type
image_typestringCOS_CONTAINERDNode image
min_node_countnumber1Minimum nodes per zone
max_node_countnumber2Maximum nodes per zone
enable_secure_bootbooltrueShielded VM secure boot
enable_integrity_monitoringbooltrueShielded VM integrity monitoring
enable_auto_repairbooltrueAuto-repair unhealthy nodes
enable_auto_upgradebooltrueAuto-upgrade node versions
disable_legacy_metadata_endpointsstring"true"Disable legacy metadata API

IAM & Secrets

VariableTypeDefaultDescription
k8s_namespacestringdecisionboxKubernetes namespace for Workload Identity
k8s_service_accountstringdecisionbox-apiK8s service account name (API)
k8s_agent_service_accountstringdecisionbox-agentK8s service account name (Agent, read-only)
enable_gcp_secretsboolfalseCreate Secret Manager IAM bindings
secret_namespacestringdecisionboxSecret name prefix for IAM conditions
enable_bigquery_iamboolfalseGrant BigQuery read access to the agent SA
enable_vertex_ai_iamboolfalseGrant Vertex AI access to the agent SA (Claude via Vertex, Gemini)

Labels

VariableTypeDefaultDescription
labelsmap(string){}Resource labels applied to all resources

Outputs

OutputSensitiveDescription
cluster_nameNoGKE cluster name
cluster_endpointYesKubernetes API endpoint
cluster_ca_certificateYesCA certificate for kubectl
vpc_nameNoVPC network name
workload_identity_sa_emailNoGCP service account for API Workload Identity
agent_workload_identity_sa_emailNoGCP service account for Agent Workload Identity (read-only)
gcp_secrets_iam_enabledNoWhether Secret Manager IAM was configured
bigquery_iam_enabledNoWhether BigQuery IAM was configured
vertex_ai_iam_enabledNoWhether Vertex AI IAM was configured

Workload Identity

The module creates a GCP service account and binds it to a Kubernetes service account via Workload Identity. This allows the API pod to authenticate to GCP services (Secret Manager, BigQuery) without storing credentials.

K8s ServiceAccount: decisionbox/decisionbox-api
↕ Workload Identity binding
GCP ServiceAccount: decisionbox-prod-api@project.iam.gserviceaccount.com
↓ IAM roles
GCP Secret Manager (namespace-scoped)
BigQuery (data viewer + job user)

The Helm chart must annotate the K8s service account:

serviceAccountAnnotations:
iam.gke.io/gcp-service-account: "decisionbox-prod-api@my-project.iam.gserviceaccount.com"

Secret Manager Scoping

When enable_gcp_secrets=true, the module creates IAM bindings with conditions that restrict the API to secrets prefixed with the configured namespace:

  • Allowed: decisionbox-project123-llm-api-key
  • Blocked: other-app-database-password

This ensures multi-tenant isolation when multiple applications share a GCP project.

Using an Existing VPC

To deploy into an existing network:

create_vpc         = false
existing_vpc_id = "projects/my-project/global/networks/my-vpc"
existing_subnet_id = "projects/my-project/regions/us-central1/subnetworks/my-subnet"

The subnet must have secondary IP ranges named pods and services.

Destroying Resources

Use the setup wizard's --destroy flag for a clean teardown:

cd terraform
./setup.sh --destroy

This uninstalls Helm releases, deletes the namespace, disables deletion protection, and runs terraform destroy.

Or manually:

# Remove Helm releases first
helm uninstall decisionbox-dashboard -n decisionbox
helm uninstall decisionbox-api -n decisionbox

# Disable deletion protection
terraform apply -var="deletion_protection=false"

# Destroy infrastructure
terraform destroy

Next Steps