Skip to content

GCP — Theory

  • Heavy data / analytics: BigQuery is best-in-class.
  • Best K8s experience (GKE Autopilot for managed control + node).
  • Need global SQL: Spanner.
  • Multi-cluster / on-prem federation: Anthos.

Avoid GCP first when org already on AWS or needs region presence GCP lacks.

  • IAM and policies inherit Org → Folder → Project → Resource.
  • Each Project = isolated compute, billing target.
  • Common pattern: env-per-project (acme-prod, acme-staging).
  • Folders for teams / business units.
  • Bill alerts at org and project level.
  • One VPC spans all regions (subnets are per-region).
  • Shared VPC — host project owns network; service projects attach. Centralized firewall + IAM.
  • Private Google Access — VMs without public IPs reach Google APIs.
  • Private Service Connect — endpoint into managed services.
  • Cloud NAT — managed egress.
  • Internal Load Balancer vs External — choose carefully for L7.
  • Pods/VMs/Cloud Run impersonate a SA.
  • Workload Identity in GKE: link K8s SA to GCP SA. Recommended over node SA.
  • Avoid SA keys; use short-lived tokens via metadata server.
  • Audit iam.serviceAccountTokenCreator, iam.serviceAccountUser — they let one identity become another.
  • Serverless. Storage and compute decoupled (slots).
  • Pay per query (on-demand) or reserved slots.
  • Avoid SELECT *; partition + cluster tables to reduce scan cost.
  • Streaming insert vs batch load — streaming costs more, near-real-time.
  • BI Engine for in-memory acceleration.
  • Containers must listen on $PORT (default 8080).
  • Statefulness: revisions are immutable; new revision = new container.
  • Cold starts exist; min instances > 0 mitigates.
  • Concurrency > 1 means same container handles multiple requests — code must be safe.
  • CPU is only allocated during request unless you enable always-on CPU.
  • Built-in service-to-service auth via Google-signed identity tokens.
  • Autopilot — Google manages nodes, scaling, security. Pay per pod resources.
  • Standard — you manage node pools.
  • Autopilot is the default for most teams now.

Manage multi-cluster / hybrid / multi-cloud K8s + service mesh. Federated config + policy. Niche.

  1. GCS classes — when each? Standard (frequent), Nearline (30d+), Coldline (90d+), Archive (cold).
  2. BigQuery cost spike — debug. Check Information Schema for top jobs by bytes; partition + cluster tables; use authorized views.
  3. Workload Identity vs node SA? WI per-pod; node SA shared across all pods on node — too broad.
  4. Cloud Run vs Cloud Functions? CF is event-driven snippet (deprecating in favor of CR functions); CR is general containers.
  5. Spanner external consistency? Enabled by TrueTime API and commit-wait — global strong consistency.
  6. VPC Service Controls — what problem? Data exfiltration. Even with valid creds, you can’t move BigQuery data out of perimeter.
  7. Cross-project IAM? Grant principal of project A roles in project B. Common pattern: shared service projects.