AI Gateway

This document describes the intended architecture and operational model for the AI Gateway component. Configuration values such as instance sizes, replica counts, engine versions, scaling limits, retention periods, and other implementation details may change over time. The Terraform and Helm configuration in modernisation-platform-environments is the source of truth.

The ai-gateway Terraform component builds the AI Gateway platform within the Data Platform. In modernisation-platform-environments, this includes:

Creation of the ai-gateway component (replacing llm-gateway)
Updates to the cluster component to support networking requirements

The Terraform approach used throughout the Data Platform has been adopted here. Where possible, community-maintained terraform-aws-modules are used to provide consistency with existing Data Platform infrastructure.

Architecture overview

The AI Gateway provides a centrally managed LiteLLM deployment running on the Data Platform EKS cluster.

The platform consists of:

LiteLLM API deployment for model inference traffic
LiteLLM Admin deployment for platform administration
Aurora PostgreSQL database for persistent application state
ElastiCache (Valkey) for cross-replica coordination
AWS Application Load Balancer (ALB) using Kubernetes Gateway API
AWS WAF for ingress protection
Microsoft Entra ID for administrator authentication
AWS Secrets Manager and External Secrets Operator for secret distribution

Aurora PostgreSQL

Aurora PostgreSQL serves as the primary database for LiteLLM.

Purpose

Aurora stores:

LiteLLM configuration
Virtual key management
User and team metadata
Audit and operational data

Security

Encryption at rest using a dedicated KMS key
Database credentials generated and managed through Terraform
Deployment into private EKS data subnets
Network access restricted to workloads within the platform VPC

Resilience

Environment-specific backup, deletion protection and recovery settings are applied according to platform requirements.

Observability

CloudWatch log exports are enabled to support operational troubleshooting and monitoring.

Credentials

Connection details are stored in AWS Secrets Manager and synchronised into Kubernetes using External Secrets Operator.

IAM database authentication

IAM database authentication is currently out of scope as there are limited benefits compared with the existing approach.

ElastiCache (Valkey)

LiteLLM uses ElastiCache (Valkey) as a shared coordination layer between application replicas.

Purpose

Valkey is used for:

Cross-replica router coordination
Shared runtime state
Internal LiteLLM coordination requirements

Response caching is intentionally disabled.

Security

Encryption at rest
TLS encryption in transit
Authentication via Secrets Manager managed credentials
Deployment into private EKS data subnets
Access restricted to workloads within the platform VPC

Secrets

Connection information is published to AWS Secrets Manager and synchronised into Kubernetes using External Secrets Operator.

LiteLLM deployments

LiteLLM is deployed as two independent Helm releases within the same namespace.

API deployment

Responsible for:

Model inference requests
Customer application traffic
Database migrations

Admin deployment

Responsible for:

Platform administration
User and key management
Operational configuration

The Admin deployment shares the same database as the API deployment but does not perform schema migrations.

Scaling

The API deployment supports horizontal scaling through Kubernetes Horizontal Pod Autoscaling (HPA).

The Admin deployment is intentionally kept as a low-volume administrative workload.

Networking

The AI Gateway uses the Kubernetes Gateway API with the aws-alb GatewayClass to provision an internet-facing AWS Application Load Balancer.

Traffic flow

Client
  ↓
Route 53
  ↓
Application Load Balancer (HTTPS)
  ↓
AWS WAF
  ↓
Kubernetes Services
  ↓
LiteLLM Pods

DNS and TLS

Dedicated Route 53 hosted zone
Wildcard ACM certificate
TLS termination at the ALB
Modern TLS security policies enforced

Gateway API

Ingress is managed through Kubernetes Gateway API resources:

Gateway
HTTPRoute
GatewayClass

This provides a Kubernetes-native approach to ALB management.

Routing

Separate routes are maintained for:

Public API traffic
Administrative traffic

Administrative traffic is isolated onto its own hostname.

WAF protection

AWS WAF protects all ingress traffic.

Controls include:

Platform allowlists
Administrative allowlists
AWS managed protection rules
Explicit blocking of non-public operational endpoints

Backend services

Aurora PostgreSQL and ElastiCache are deployed within private network boundaries and are only accessible from authorised workloads.

Karpenter scheduling

LiteLLM workloads are scheduled onto dedicated Karpenter-managed node pools.

Benefits

Automatic infrastructure scaling
Automatic node consolidation
Efficient utilisation of AWS Graviton processors
Reduced operational overhead compared with fixed node groups

Microsoft Entra ID single sign-on (SSO)

The AI Gateway integrates with Microsoft Entra ID for administrator authentication.

Benefits

Centralised identity management
Existing organisational access controls
Reduced credential management overhead
Consistent administrative experience

Secrets management

Secrets are managed using:

AWS Secrets Manager
External Secrets Operator

This approach provides:

Centralised secret storage
Automated Kubernetes secret synchronisation
Reduced manual secret handling
Consistent secret lifecycle management