AWS CDK application written in Java that provisions an Apache Druid deployment on Amazon EKS (Elastic Kubernetes Service) with integrated AWS managed services for real-time OLAP analytics at scale.
This CDK application provisions a production-ready Apache Druid deployment on Amazon EKS with fully integrated AWS managed services. Druid is a high-performance, real-time analytics database designed for workflows where fast queries and ingest are critical. The architecture follows EKS Best Practices and Analytics Lens recommendations.
| Feature | Description | Reference |
|---|---|---|
| EKS Cluster | Managed Kubernetes control plane with RBAC configuration | EKS User Guide |
| AWS Managed Addons | VPC CNI, EBS CSI, CoreDNS, Kube Proxy, Pod Identity Agent, CloudWatch Container Insights | EKS Add-ons |
| Helm Chart Addons | cert-manager, AWS Load Balancer Controller, Karpenter, CSI Secrets Store | Helm |
| Apache Druid | Real-time OLAP database with sub-second query latency | Druid Documentation |
| RDS PostgreSQL | Managed database for Druid metadata storage | Amazon RDS |
| S3 Deep Storage | Scalable object storage for Druid segments | S3 Deep Storage |
| MSK (Kafka) | Managed streaming for real-time data ingestion | Amazon MSK |
| Grafana Cloud Integration | Full observability stack with metrics, logs, and traces | Grafana Cloud |
| Managed Node Groups | Bottlerocket AMIs for enhanced security | Managed Node Groups |
flowchart TB
subgraph "Data Sources"
KAFKA[MSK Kafka]
S3IN[S3 Ingestion]
BATCH[Batch Files]
end
subgraph "EKS Cluster"
subgraph "Druid Cluster"
COORD[Coordinator]
OVER[Overlord]
BROKER[Broker]
ROUTER[Router]
HIST[Historical]
MM[MiddleManager]
end
end
subgraph "AWS Managed Services"
RDS[(RDS PostgreSQL)]
S3DEEP[S3 Deep Storage]
S3MSQ[S3 MSQ Storage]
end
subgraph "Query Clients"
CONSOLE[Web Console]
JDBC[JDBC Clients]
API[REST API]
end
KAFKA --> MM
S3IN --> MM
BATCH --> MM
MM --> OVER
OVER --> COORD
COORD --> RDS
MM --> S3DEEP
HIST --> S3DEEP
CONSOLE --> ROUTER
JDBC --> BROKER
API --> BROKER
ROUTER --> BROKER
BROKER --> HIST
sequenceDiagram
participant Source as Data Source
participant Kafka as MSK Kafka
participant MM as MiddleManager
participant Overlord
participant S3 as S3 Deep Storage
participant Historical
participant Broker
Source->>Kafka: Publish Events
Kafka->>MM: Consume Batch
MM->>MM: Parse & Index
MM->>Overlord: Report Progress
MM->>S3: Push Segment
S3->>Historical: Load Segment
Historical->>Historical: Cache in Memory
Note over Broker,Historical: Query Path
Broker->>Historical: Query Segment
Historical-->>Broker: Results
The Druid infrastructure uses a layered architecture with CloudFormation nested stacks:
flowchart TB
subgraph "DeploymentStack (main)"
MAIN[Main Stack]
end
subgraph "Nested Stacks"
VPC[VpcNestedStack]
EKS[EksNestedStack]
SETUP[DruidSetupNestedStack]
DRUID[DruidNestedStack]
end
MAIN --> VPC
MAIN --> EKS
MAIN --> SETUP
MAIN --> DRUID
VPC -.->|depends on| EKS
VPC -.->|depends on| SETUP
EKS -.->|depends on| DRUID
SETUP -.->|depends on| DRUID
Dependency Chain:
- VPC is created first (network foundation)
- EKS cluster is provisioned (independent of Druid setup)
- Druid setup creates supporting resources (RDS, S3, MSK) that depend on VPC
- Druid Helm chart is deployed after both EKS and setup are ready
flowchart LR
subgraph "Master Nodes"
COORD[Coordinator<br/>Segment Management]
OVER[Overlord<br/>Task Management]
end
subgraph "Query Nodes"
BROKER[Broker<br/>Query Routing]
ROUTER[Router<br/>API Gateway]
end
subgraph "Data Nodes"
HIST[Historical<br/>Segment Storage]
MM[MiddleManager<br/>Ingestion Tasks]
end
ROUTER --> BROKER
ROUTER --> COORD
ROUTER --> OVER
BROKER --> HIST
OVER --> MM
COORD --> HIST
Apache Druid consists of several specialized node types:
| Node Type | Purpose | Reference |
|---|---|---|
| Coordinator | Manages data availability and segment distribution | Coordinator Process |
| Overlord | Controls data ingestion workload assignment | Overlord Process |
| Broker | Handles queries from external clients | Broker Process |
| Router | Routes requests to Brokers, Coordinators, and Overlords | Router Process |
| Historical | Stores and queries historical data segments | Historical Process |
| MiddleManager | Executes submitted ingestion tasks | MiddleManager Process |
| Service | Druid Component | Purpose | Reference |
|---|---|---|---|
| RDS PostgreSQL | Metadata Storage | Stores segment metadata, rules, and configuration | Metadata Storage |
| S3 | Deep Storage | Long-term segment storage for Historical nodes | Deep Storage |
| S3 | Multi-Stage Query | Intermediate storage for MSQ engine | MSQ |
| MSK (Kafka) | Real-time Ingestion | Streaming data source for Druid supervisors | Kafka Ingestion |
The cluster integrates with Grafana Cloud for comprehensive observability:
| Component | Purpose | Reference |
|---|---|---|
| Prometheus | Druid and Kubernetes metrics collection | Grafana Mimir |
| Loki | Log aggregation from all Druid processes | Grafana Loki |
| Tempo | Distributed tracing for query analysis | Grafana Tempo |
| Pyroscope | Continuous profiling for performance optimization | Grafana Pyroscope |
| OpenTelemetry Collector | Telemetry data collection and export | OpenTelemetry |
When deployed through the Fastish platform, this infrastructure integrates with internal platform services:
| Platform Component | Integration Point | Purpose |
|---|---|---|
| Orchestrator | Release pipeline automation | Automated CDK synthesis and deployment via CodePipeline |
| Portal | Subscriber management | Tenant provisioning, cluster access control |
| Network | Shared VPC infrastructure | Cross-stack connectivity for platform services |
| Reporting | Usage metering | Pipeline execution tracking and cost attribution |
These integrations are managed automatically when deploying via the platform's release workflows.
| Requirement | Version | Installation |
|---|---|---|
| Java | 21+ | SDKMAN |
| Maven | 3.8+ | Maven Download |
| AWS CLI | 2.x | AWS CLI Install |
| AWS CDK CLI | 2.221.0+ | CDK Getting Started |
| kubectl | 1.28+ | kubectl Install |
| Helm | 3.x | Helm Install |
| Docker | Latest | Docker Install |
| GitHub CLI | Latest | GitHub CLI |
| Grafana Cloud Account | - | Grafana Cloud |
AWS CDK Bootstrap:
cdk bootstrap aws://<account-id>/<region>Replace
<account-id>with your AWS account ID and<region>with your desired AWS region (e.g.,us-west-2). This sets up necessary resources for CDK deployments including an S3 bucket for assets and CloudFormation execution roles. See: CDK Bootstrapping | Bootstrap CLI Reference
gh repo clone fast-ish/cdk-common
gh repo clone fast-ish/aws-druid-inframvn -f cdk-common/pom.xml clean install
mvn -f aws-druid-infra/pom.xml clean installIf using custom Druid images or Helm charts, prepare the artifacts in Amazon ECR:
# Authenticate to ECR
aws ecr get-login-password --region <region> | \
docker login --username AWS --password-stdin <account-id>.dkr.ecr.<region>.amazonaws.com
# Create repository
aws ecr create-repository \
--repository-name fasti.sh/v1/docker/druid \
--region <region> \
--image-scanning-configuration scanOnPush=true
# Build and push image
docker buildx build --provenance=false --platform linux/amd64 -f Dockerfile.druid \
-t <account-id>.dkr.ecr.<region>.amazonaws.com/fasti.sh/v1/docker/druid:$(date +'%Y%m%d') \
-t <account-id>.dkr.ecr.<region>.amazonaws.com/fasti.sh/v1/docker/druid:v1 \
-t <account-id>.dkr.ecr.<region>.amazonaws.com/fasti.sh/v1/docker/druid:latest \
--push .See: ECR User Guide
# Authenticate Helm to ECR
aws ecr get-login-password --region <region> | \
helm registry login --username AWS --password-stdin <account-id>.dkr.ecr.<region>.amazonaws.com
# Create repository for Helm charts
aws ecr create-repository \
--repository-name fasti.sh/v1/helm/druid \
--region <region> \
--image-scanning-configuration scanOnPush=true \
--encryption-configuration encryptionType=AES256
# Package and push chart
helm package ./helm/chart/druid
helm push druid-<version>.tgz oci://<account-id>.dkr.ecr.<region>.amazonaws.com/fasti.sh/v1/helmSee: Helm OCI Support
Update the Docker image reference in src/main/resources/prototype/v1/druid/values.mustache:
| Parameter | Description | Example |
|---|---|---|
image.repository |
ECR repository for Druid Docker image | 000000000000.dkr.ecr.us-west-2.amazonaws.com/fasti.sh/v1/docker/druid |
image.tag |
Tag of the Druid Docker image | v1, latest, or date tag |
image.pullPolicy |
Pull policy for the Docker image | IfNotPresent |
Update the Helm chart reference in src/main/resources/prototype/v1/conf.mustache:
| Parameter | Description | Example |
|---|---|---|
chart.repository |
ECR repository for Druid Helm chart | oci://000000000000.dkr.ecr.us-west-2.amazonaws.com/fasti.sh/v1/helm |
chart.name |
Name of the Druid Helm chart | druid |
chart.version |
Version of the Druid Helm chart | 34.0.0 |
Create aws-druid-infra/cdk.context.json from aws-druid-infra/cdk.context.template.json:
Required Configuration Parameters:
| Parameter | Description | Example |
|---|---|---|
:account |
AWS account ID (12-digit number) | 123456789012 |
:region |
AWS region for deployment | us-west-2 |
:domain |
Registered domain name (optional) | example.com |
:environment |
Environment name (do not change) | prototype |
:version |
Resource version identifier | v1 |
Notes:
:environmentand:versionmap to resource files ataws-druid-infra/src/main/resources/prototype/v1- These values determine which configuration templates are loaded during CDK synthesis
Add Grafana Cloud configuration for observability:
{
"hosted:eks:grafana:instanceId": "000000",
"hosted:eks:grafana:key": "glc_xyz",
"hosted:eks:grafana:lokiHost": "https://logs-prod-000.grafana.net",
"hosted:eks:grafana:lokiUsername": "000000",
"hosted:eks:grafana:prometheusHost": "https://prometheus-prod-000-prod-us-west-0.grafana.net",
"hosted:eks:grafana:prometheusUsername": "0000000",
"hosted:eks:grafana:tempoHost": "https://tempo-prod-000-prod-us-west-0.grafana.net/tempo",
"hosted:eks:grafana:tempoUsername": "000000",
"hosted:eks:grafana:pyroscopeHost": "https://profiles-prod-000.grafana.net:443"
}Grafana Cloud Setup:
- Create Account: Sign up at grafana.com
- Create Stack: Navigate to your stack settings
- Generate API Key: Create key with required permissions
| Parameter | Location | Description |
|---|---|---|
instanceId |
Stack details page | Unique identifier for your Grafana instance |
key |
API keys section | API key with all permissions (starts with glc_) |
lokiHost |
Logs > Data Sources > Loki | Endpoint URL for logs |
lokiUsername |
Logs > Data Sources > Loki | Account identifier for Loki |
prometheusHost |
Metrics > Data Sources > Prometheus | Endpoint URL for metrics |
prometheusUsername |
Metrics > Data Sources > Prometheus | Account identifier for Prometheus |
tempoHost |
Traces > Data Sources > Tempo | Endpoint URL for traces |
tempoUsername |
Traces > Data Sources > Tempo | Account identifier for Tempo |
pyroscopeHost |
Profiles > Connect a Data Source | Endpoint URL for profiling |
Required API Key Permissions:
| Permission | Access | Purpose |
|---|---|---|
metrics |
Read/Write | Prometheus metrics ingestion |
logs |
Read/Write | Loki log ingestion |
traces |
Read/Write | Tempo trace ingestion |
profiles |
Read/Write | Pyroscope profiling data |
alerts |
Read/Write | Alerting configuration |
rules |
Read/Write | Recording and alerting rules |
See: Grafana Cloud Kubernetes Monitoring
Add IAM role mappings in cdk.context.json for EKS access entries:
{
"hosted:eks:administrators": [
{
"username": "administrator",
"role": "arn:aws:iam::000000000000:role/AWSReservedSSO_AdministratorAccess_abc",
"email": "admin@example.com"
}
],
"hosted:eks:users": [
{
"username": "user",
"role": "arn:aws:iam::000000000000:role/AWSReservedSSO_DeveloperAccess_abc",
"email": "user@example.com"
}
]
}| Parameter | Description | Reference |
|---|---|---|
administrators |
IAM roles with full cluster admin access | Cluster Admin |
users |
IAM roles with read-only cluster access | RBAC Authorization |
username |
Identifier for the user in Kubernetes RBAC | User Mapping |
role |
AWS IAM role ARN (typically from AWS IAM Identity Center) | IAM Roles |
email |
For identification and traceability | - |
cd aws-druid-infra
# Preview changes
cdk synth
# Deploy all stacks
cdk deploySee: CDK Deploy Command | CDK Synth Command
What Gets Deployed:
| Resource Type | Count | Description | Reference |
|---|---|---|---|
| CloudFormation Stacks | 5 | 1 main + 4 nested stacks | Nested Stacks |
| VPC | 1 | Multi-AZ with public/private subnets | VPC Documentation |
| EKS Cluster | 1 | Kubernetes 1.28+ control plane | EKS Clusters |
| RDS PostgreSQL | 1 | Druid metadata database | RDS PostgreSQL |
| S3 Buckets | 2 | Deep storage + MSQ intermediate | S3 User Guide |
| MSK Cluster | 1 | Kafka for real-time ingestion | Amazon MSK |
| Managed Node Groups | 1+ | Bottlerocket-based worker nodes | Managed Node Groups |
| Druid Deployment | 1 | All Druid node types via Helm | Druid Helm Chart |
# Update kubeconfig
aws eks update-kubeconfig --name <cluster-name> --region <region>
# Verify cluster connectivity
kubectl get nodes
kubectl get pods -A
# Check Druid pods
kubectl get pods -n druidSee: Connecting to EKS
The build process uses Mustache templating to inject context variables into configuration files. See cdk-common for the complete build process documentation.
| Variable | Type | Description |
|---|---|---|
{{account}} |
String | AWS account ID |
{{region}} |
String | AWS region |
{{environment}} |
String | Environment name |
{{version}} |
String | Resource version |
{{hosted:id}} |
String | Unique deployment identifier |
src/main/resources/
└── prototype/
└── v1/
├── conf.mustache # Main configuration
├── eks/
│ ├── cluster.mustache # EKS cluster configuration
│ ├── addons.mustache # Managed addons
│ └── nodegroups.mustache # Node group configuration
├── druid/
│ ├── values.mustache # Druid Helm chart values
│ ├── rds.mustache # RDS configuration
│ ├── s3.mustache # S3 bucket configuration
│ └── msk.mustache # MSK cluster configuration
├── helm/
│ ├── karpenter.mustache # Karpenter values
│ └── monitoring.mustache # Grafana stack values
└── iam/
└── roles.mustache # IAM role definitions
The Druid Router provides a web console for administration:
# Port-forward to Druid Router
kubectl port-forward svc/druid-router 8888:8888 -n druid
# Access console at http://localhost:8888See: Druid Web Console
Create a Kafka ingestion supervisor to stream data:
{
"type": "kafka",
"spec": {
"dataSchema": {
"dataSource": "my-datasource",
"timestampSpec": {
"column": "timestamp",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": ["dimension1", "dimension2"]
}
},
"ioConfig": {
"type": "kafka",
"consumerProperties": {
"bootstrap.servers": "<msk-bootstrap-servers>"
},
"topic": "my-topic"
}
}
}See: Kafka Ingestion
Druid supports multiple query languages:
| Method | Description | Reference |
|---|---|---|
| Druid SQL | SQL-compatible queries via Broker | Druid SQL |
| Native Queries | JSON-based query format | Native Queries |
| JDBC | Standard JDBC driver connectivity | JDBC Driver |
| Aspect | Implementation | Reference |
|---|---|---|
| Node AMI | Bottlerocket for minimal attack surface | Bottlerocket |
| Pod Identity | IAM roles for service accounts | Pod Identity |
| Network Policies | VPC CNI for pod-level network isolation | Network Policies |
| Secrets Management | CSI Secrets Store with AWS Secrets Manager | Secrets Store |
| RDS Encryption | Encryption at rest with KMS | RDS Encryption |
| S3 Encryption | Server-side encryption (SSE-S3) | S3 Encryption |
| MSK Encryption | TLS in transit, KMS at rest | MSK Encryption |
See: EKS Best Practices Guide - Security
# Check CDK synthesis
cdk synth --quiet 2>&1 | head -20
# Verify CloudFormation stack status
aws cloudformation describe-stacks --stack-name <stack-name> \
--query 'Stacks[0].StackStatus'
# Check EKS cluster status
aws eks describe-cluster --name <cluster-name> \
--query 'cluster.status'
# Verify Druid pods
kubectl get pods -n druid
kubectl describe pod <druid-pod> -n druid
# Check Druid logs
kubectl logs -l app=druid-coordinator -n druid --tail=50
kubectl logs -l app=druid-broker -n druid --tail=50
# Verify RDS connectivity
kubectl run pg-test --rm -it --image=postgres:15 -- \
psql -h <rds-endpoint> -U druid -d druid -c "SELECT 1"
# Check MSK cluster status
aws kafka describe-cluster --cluster-arn <msk-arn> \
--query 'ClusterInfo.State'
# Test Kafka connectivity from pod
kubectl exec -it <druid-middlemanager-pod> -n druid -- \
kafka-broker-api-versions --bootstrap-server <msk-bootstrap>:9092| Issue | Symptom | Resolution |
|---|---|---|
| RDS connection timeout | Druid coordinator fails to start | Verify security group allows port 5432 from EKS nodes |
| MSK authentication failure | MiddleManager ingestion errors | Check IAM role permissions for MSK access |
| S3 deep storage errors | Segment handoff failures | Verify S3 bucket policy and IAM permissions |
| Druid OOM | Historical/MiddleManager pod restarts | Increase memory limits in values.yaml |
| Grafana no metrics | Empty dashboards | Verify Grafana Cloud credentials in cdk.context.json |
For detailed troubleshooting procedures, see the Troubleshooting Guide.
| Resource | Description |
|---|---|
| Fastish Documentation | Platform documentation home |
| cdk-common | Shared CDK constructs library |
| Troubleshooting Guide | Common issues and solutions |
| Validation Guide | Deployment validation procedures |
| Upgrade Guide | Upgrade and rollback procedures |
| Capacity Planning | Sizing and cost guidance |
| IAM Permissions | Minimum required permissions |
| Network Requirements | CIDR, ports, and security groups |
| Glossary | Platform terminology |
| Changelog | Version history |
| Resource | Description |
|---|---|
| EKS User Guide | Official EKS documentation |
| EKS Best Practices | AWS EKS best practices guide |
| Analytics Lens | Analytics architecture guidance |
| Amazon MSK Developer Guide | MSK documentation |
| MSK Best Practices | MSK configuration guidance |
| Amazon RDS User Guide | RDS documentation |
| S3 Best Practices | S3 performance optimization |
| Resource | Description |
|---|---|
| Apache Druid Documentation | Official Druid documentation |
| Druid Architecture | Druid design and components |
| Druid Tuning Guide | Performance optimization |
| Resource | Description |
|---|---|
| Grafana Cloud Docs | Grafana Cloud documentation |
| OpenTelemetry Documentation | Telemetry collection framework |
For your convenience, you can find the full MIT license text at:
- https://opensource.org/license/mit/ (Official OSI website)
- https://choosealicense.com/licenses/mit/ (Choose a License website)