Platform Architecture

Overview

The Internal Developer Platform (IDP) is built on Kubernetes (GKE) and provides a self-service platform for teams to build, ship, and run services with observability, alerting, autoscaling, and security built in by default.

Core Components

1. Kubernetes Cluster (GKE)

Purpose: Container orchestration platform providing compute, networking, and storage primitives
Configuration: Managed via Terraform/IaC for reproducibility
Availability: Single availability zone (1 AZ) for initial deployment
Features:
- Autoscaling node pools
- Managed control plane
- Integrated networking and load balancing

2. GitOps (Flux)

Purpose: Declarative deployment management ensuring cluster state matches Git
Approach: Gitless GitOps using OCI artifacts as deployment vehicles
Workflow:
- Make your kustomize changes locally, then run flux locally using tilt.dev to test
- PR merged → OCI artifact created
- GitOps operator syncs from artifact registry
- Cluster state reconciles to desired state
Benefits:
- Runs locally so you can test your changes before creating PRs
- Immutable audit trail (all changes via Git)
- Automatic drift detection and correction
- Rollback via Git history

3. Secrets Management

Solution: SOPS (Sealed Secrets) or External Secrets Operator
Storage: Encrypted secrets stored in Git, versioned and auditable
Decryption: KMS-backed decryption at deployment time
Security: Secrets never stored in plaintext in Git

4. Observability Stack (TBD)

Metrics:
- Prometheus running in-cluster (GKE integration)
- New Relic agents for application metrics & cluster metrics
Dashboards: Standardized golden signals dashboards per service (TBD)
Logs: Centralized logging via Google Cloud Logging
Traces: Distributed tracing via New Relic

5. Autoscaling

HPA (Horizontal Pod Autoscaler): Primary scaling mechanism based on CPU/memory metrics
KEDA (Kubernetes Event-Driven Autoscaling): Advanced scaling for queue-based workloads
Scaling Targets:
- Queue depth/lag
- Request rate (RPS)
- CPU/memory utilization
Reaction Time: ≤2 minutes p95 for scale-up events
Scale-to-Zero: Optional for idle workers

6. Worker Templates & Paved Road

Components:
- Dockerfile templates
- CI/CD pipeline configuration
- Helm/Kustomize manifests
- KEDA/HPA scaling definitions
- Default SLO/alerts
- Pre-configured dashboards
Purpose: Reduce time-to-production for new workers

Architecture Flow

┌─────────────────┐
│   Git Repo      │
│   (Source)      │
└────────┬────────┘
         │
         │ PR Merge
         ▼
┌─────────────────┐
│   CI/CD         │
│   Build Image   │
└────────┬────────┘
         │
         │ Push OCI Artifact
         ▼
┌─────────────────┐
│   GitOps        │
│   (Flux) │
└────────┬────────┘
         │
         │ Sync & Deploy
         ▼
┌─────────────────┐
│   GKE Cluster   │
│   ┌───────────┐ │
│   │  Workers  │ │
│   │  (Pods)   │ │
│   └─────┬─────┘ │
│         │       │
│         ▼       │
│   ┌───────────┐ │
│   │  HPA/KEDA │ │
│   │  Scaling  │ │
│   └───────────┘ │
└────────┬────────┘
         │
         │ Metrics/Logs
         ▼
┌─────────────────┐
│  Observability  │
│  (New Relic +   │
│   Prometheus)   │
└─────────────────┘

Key Principles

GitOps First: All changes flow through Git, no manual kubectl operations
Self-Service: Teams can deploy independently with guardrails
Observable by Default: Every service gets dashboards and alerts
Secure by Default: Secrets encrypted, RBAC enforced, policy-driven
Cost-Effective: Autoscaling reduces idle costs, better resource utilization
Progressive Delivery: Safe rollouts with automatic rollback on failure

Overview​

Core Components​

1. Kubernetes Cluster (GKE)​

2. GitOps (Flux)​

3. Secrets Management​

4. Observability Stack (TBD)​

5. Autoscaling​

6. Worker Templates & Paved Road​

Architecture Flow​

Key Principles​