Skip to main content

Platform Architecture

Overview

The Internal Developer Platform (IDP) is built on Kubernetes (GKE) and provides a self-service platform for teams to build, ship, and run services with observability, alerting, autoscaling, and security built in by default.

Core Components

1. Kubernetes Cluster (GKE)

  • Purpose: Container orchestration platform providing compute, networking, and storage primitives
  • Configuration: Managed via Terraform/IaC for reproducibility
  • Availability: Single availability zone (1 AZ) for initial deployment
  • Features:
    • Autoscaling node pools
    • Managed control plane
    • Integrated networking and load balancing

2. GitOps (Flux)

  • Purpose: Declarative deployment management ensuring cluster state matches Git
  • Approach: Gitless GitOps using OCI artifacts as deployment vehicles
  • Workflow:
    • Make your kustomize changes locally, then run flux locally using tilt.dev to test
    • PR merged → OCI artifact created
    • GitOps operator syncs from artifact registry
    • Cluster state reconciles to desired state
  • Benefits:
    • Runs locally so you can test your changes before creating PRs
    • Immutable audit trail (all changes via Git)
    • Automatic drift detection and correction
    • Rollback via Git history

3. Secrets Management

  • Solution: SOPS (Sealed Secrets) or External Secrets Operator
  • Storage: Encrypted secrets stored in Git, versioned and auditable
  • Decryption: KMS-backed decryption at deployment time
  • Security: Secrets never stored in plaintext in Git

4. Observability Stack (TBD)

  • Metrics:
    • Prometheus running in-cluster (GKE integration)
    • New Relic agents for application metrics & cluster metrics
  • Dashboards: Standardized golden signals dashboards per service (TBD)
  • Logs: Centralized logging via Google Cloud Logging
  • Traces: Distributed tracing via New Relic

5. Autoscaling

  • HPA (Horizontal Pod Autoscaler): Primary scaling mechanism based on CPU/memory metrics
  • KEDA (Kubernetes Event-Driven Autoscaling): Advanced scaling for queue-based workloads
  • Scaling Targets:
    • Queue depth/lag
    • Request rate (RPS)
    • CPU/memory utilization
  • Reaction Time: ≤2 minutes p95 for scale-up events
  • Scale-to-Zero: Optional for idle workers

6. Worker Templates & Paved Road

  • Components:
    • Dockerfile templates
    • CI/CD pipeline configuration
    • Helm/Kustomize manifests
    • KEDA/HPA scaling definitions
    • Default SLO/alerts
    • Pre-configured dashboards
  • Purpose: Reduce time-to-production for new workers

Architecture Flow

┌─────────────────┐
│ Git Repo │
│ (Source) │
└────────┬────────┘

│ PR Merge

┌─────────────────┐
│ CI/CD │
│ Build Image │
└────────┬────────┘

│ Push OCI Artifact

┌─────────────────┐
│ GitOps │
│ (Flux) │
└────────┬────────┘

│ Sync & Deploy

┌─────────────────┐
│ GKE Cluster │
│ ┌───────────┐ │
│ │ Workers │ │
│ │ (Pods) │ │
│ └─────┬─────┘ │
│ │ │
│ ▼ │
│ ┌───────────┐ │
│ │ HPA/KEDA │ │
│ │ Scaling │ │
│ └───────────┘ │
└────────┬────────┘

│ Metrics/Logs

┌─────────────────┐
│ Observability │
│ (New Relic + │
│ Prometheus) │
└─────────────────┘

Key Principles

  1. GitOps First: All changes flow through Git, no manual kubectl operations
  2. Self-Service: Teams can deploy independently with guardrails
  3. Observable by Default: Every service gets dashboards and alerts
  4. Secure by Default: Secrets encrypted, RBAC enforced, policy-driven
  5. Cost-Effective: Autoscaling reduces idle costs, better resource utilization
  6. Progressive Delivery: Safe rollouts with automatic rollback on failure