Nomad on AWS: Container Orchestration Without Kubernetes Complexity

Let's get something out of the way: Kubernetes won the container orchestration wars. It's the default choice, the industry standard, the thing everyone puts on their resume.
But winning doesn't mean it's the right choice for every workload. Kubernetes brings enormous capability - and enormous complexity. For teams that don't need its massive feature set, or don't want to manage its operational overhead, HashiCorp Nomad offers a compelling alternative.
Nomad handles containers, VMs, Java apps, and batch jobs with a single scheduler. It's simpler to operate, easier to understand, and integrates natively with Vault and Consul.

Let me walk you through deploying a production-ready Nomad cluster on AWS with Terraform.
_____
Why Consider Nomad?
- Simpler mental model - Nomad has one binary, one configuration language, and fewer concepts to learn. Jobs, tasks, groups - that covers most of what you need to know.
- Multi-workload scheduling - Containers, raw binaries, Java JARs, batch jobs, and system services. One scheduler for everything. No need to containerize your legacy Java app just to run it.
- Easier operations - No etcd cluster to manage. Raft consensus is built-in. Upgrades are straightforward. I've seen teams run Nomad in production with a fraction of the operational overhead of Kubernetes.
- HashiCorp ecosystem integration - First-class Vault integration for secrets. Consul for service discovery and mesh. Terraform for provisioning. If you're already in the HashiCorp world, everything just works together.
- Federation built-in - Multi-region, multi-datacenter deployments without bolt-on solutions.
______
When Kubernetes is the Better Choice
Let's be honest about when Nomad isn't the right fit:

- You need the massive CNCF ecosystem of tools and integrations
- Your team already has Kubernetes expertise
- You're running on managed Kubernetes (EKS, GKE) and don't want to manage clusters yourself
- You need advanced scheduling features like custom schedulers or complex operators
When Nomad Shines
- Mixed workloads where containers live alongside legacy apps
- Simpler infrastructure requirements
- Teams without dedicated platform engineers
- Edge computing and smaller deployments
- Batch processing workloads
______
Architecture Overview

- Servers run the scheduler and maintain cluster state via Raft consensus. Always run 3 or 5 for high availability.
- Clients run the actual workloads. Scale based on demand via Auto Scaling Groups.
- Consul provides service discovery and health checking. Optional but highly recommended.
- Vault handles secrets management for jobs. Also optional but makes secrets so much easier.
______
Terraform Infrastructure
Project Structure
1nomad-cluster/
2├── modules/
3│ ├── nomad-servers/
4│ ├── nomad-clients/
5│ ├── consul/
6│ └── networking/
7├── environments/
8│ ├── prod/
9│ └── dev/
10└── packer/
11 ├── nomad-server.pkr.hcl
12 └── nomad-client.pkr.hclNomad Server Module
1# modules/nomad-servers/main.tf
2data "aws_ami" "nomad_server" {
3 most_recent = true
4 owners = ["self"]
5
6 filter {
7 name = "name"
8 values = ["nomad-server-*"]
9 }
10
11 filter {
12 name = "tag:Component"
13 values = ["nomad-server"]
14 }
15}
16
17resource "aws_launch_template" "server" {
18 name_prefix = "${var.cluster_name}-server-"
19 image_id = data.aws_ami.nomad_server.id
20 instance_type = var.server_instance_type
21
22 iam_instance_profile {
23 name = aws_iam_instance_profile.server.name
24 }
25
26 network_interfaces {
27 associate_public_ip_address = false
28 security_groups = [aws_security_group.server.id]
29 }
30
31 user_data = base64encode(templatefile("${path.module}/templates/server-userdata.sh.tpl", {
32 cluster_name = var.cluster_name
33 region = var.region
34 datacenter = var.datacenter
35 server_count = var.server_count
36 consul_encrypt_key = var.consul_encrypt_key
37 vault_addr = var.vault_addr
38 }))
39
40 tag_specifications {
41 resource_type = "instance"
42 tags = merge(var.common_tags, {
43 Name = "${var.cluster_name}-server"
44 Role = "nomad-server"
45 })
46 }
47
48 lifecycle {
49 create_before_destroy = true
50 }
51}
52
53resource "aws_autoscaling_group" "server" {
54 name = "${var.cluster_name}-server"
55 desired_capacity = var.server_count
56 min_size = var.server_count
57 max_size = var.server_count
58 vpc_zone_identifier = var.private_subnet_ids
59 target_group_arns = [aws_lb_target_group.server_http.arn]
60
61 launch_template {
62 id = aws_launch_template.server.id
63 version = "$Latest"
64 }
65
66 instance_refresh {
67 strategy = "Rolling"
68 preferences {
69 min_healthy_percentage = 66
70 }
71 }
72
73 tag {
74 key = "NomadAutoJoin"
75 value = var.cluster_name
76 propagate_at_launch = true
77 }
78}Server Configuration Template
The user data script configures both Consul and Nomad:
1#!/bin/bash
2set -e
3
4INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
5PRIVATE_IP=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)
6
7# Configure Consul agent
8cat <<EOF > /etc/consul.d/consul.hcl
9datacenter = "${datacenter}"
10data_dir = "/opt/consul/data"
11log_level = "INFO"
12
13bind_addr = "$${PRIVATE_IP}"
14client_addr = "0.0.0.0"
15
16retry_join = ["provider=aws tag_key=NomadAutoJoin tag_value=${cluster_name}"]
17encrypt = "${consul_encrypt_key}"
18
19connect {
20 enabled = true
21}
22
23ports {
24 grpc = 8502
25}
26EOF
27
28systemctl enable consul
29systemctl start consul
30
31# Wait for Consul to be ready
32until consul members; do
33 sleep 2
34done
35
36# Configure Nomad server
37cat <<EOF > /etc/nomad.d/nomad.hcl
38datacenter = "${datacenter}"
39region = "${region}"
40data_dir = "/opt/nomad/data"
41
42bind_addr = "0.0.0.0"
43
44advertise {
45 http = "$${PRIVATE_IP}"
46 rpc = "$${PRIVATE_IP}"
47 serf = "$${PRIVATE_IP}"
48}
49
50server {
51 enabled = true
52 bootstrap_expect = ${server_count}
53
54 server_join {
55 retry_join = ["provider=aws tag_key=NomadAutoJoin tag_value=${cluster_name}"]
56 }
57
58 default_scheduler_config {
59 scheduler_algorithm = "spread"
60
61 preemption_config {
62 batch_scheduler_enabled = true
63 system_scheduler_enabled = true
64 service_scheduler_enabled = true
65 }
66 }
67}
68
69consul {
70 address = "127.0.0.1:8500"
71}
72
73vault {
74 enabled = true
75 address = "${vault_addr}"
76}
77
78acl {
79 enabled = true
80}
81
82telemetry {
83 collection_interval = "10s"
84 disable_hostname = true
85 prometheus_metrics = true
86 publish_allocation_metrics = true
87 publish_node_metrics = true
88}
89EOF
90
91systemctl enable nomad
92systemctl start nomad_____
Deploying Applications
Service Job Specification
Here's what a typical web application looks like in Nomad:
1# jobs/web-app.nomad.hcl
2job "web-app" {
3 datacenters = ["dc1"]
4 type = "service"
5
6 group "web" {
7 count = 3
8
9 spread {
10 attribute = "${attr.platform.aws.placement.availability-zone}"
11 weight = 100
12 }
13
14 network {
15 port "http" {
16 to = 8080
17 }
18 }
19
20 service {
21 name = "web-app"
22 port = "http"
23
24 tags = [
25 "traefik.enable=true",
26 "traefik.http.routers.web.rule=Host(`app.example.com`)"
27 ]
28
29 check {
30 type = "http"
31 path = "/health"
32 interval = "10s"
33 timeout = "2s"
34 }
35 }
36
37 task "app" {
38 driver = "docker"
39
40 config {
41 image = "myregistry.com/web-app:${NOMAD_META_version}"
42 ports = ["http"]
43 }
44
45 resources {
46 cpu = 500
47 memory = 512
48 }
49
50 env {
51 PORT = "${NOMAD_PORT_http}"
52 ENVIRONMENT = "production"
53 }
54
55 # Vault secrets integration - this is where Nomad shines
56 vault {
57 policies = ["web-app"]
58 }
59
60 template {
61 data = <<-EOT
62 {{ with secret "database/creds/web-app" }}
63 DB_USERNAME={{ .Data.username }}
64 DB_PASSWORD={{ .Data.password }}
65 {{ end }}
66 {{ with secret "secret/data/web-app/config" }}
67 API_KEY={{ .Data.data.api_key }}
68 {{ end }}
69 EOT
70 destination = "secrets/env.txt"
71 env = true
72 }
73 }
74
75 update {
76 max_parallel = 1
77 min_healthy_time = "30s"
78 healthy_deadline = "5m"
79 auto_revert = true
80 canary = 1
81 }
82 }
83}Notice the Vault integration - Nomad templates pull secrets directly from Vault and inject them as environment variables. No sidecar containers needed.
__
Batch Job Example
Nomad handles batch jobs elegantly:
1job "data-processor" {
2 datacenters = ["dc1"]
3 type = "batch"
4
5 periodic {
6 cron = "0 2 * * *" # Daily at 2 AM
7 prohibit_overlap = true
8 time_zone = "Asia/Kolkata"
9 }
10
11 group "processor" {
12 count = 1
13
14 task "etl" {
15 driver = "docker"
16
17 config {
18 image = "myregistry.com/etl-processor:latest"
19 command = "python"
20 args = ["run_etl.py", "--date", "${NOMAD_META_date}"]
21 }
22
23 resources {
24 cpu = 2000
25 memory = 4096
26 }
27
28 vault {
29 policies = ["etl-processor"]
30 }
31
32 template {
33 data = <<-EOT
34 {{ with secret "secret/data/etl/config" }}
35 S3_BUCKET={{ .Data.data.bucket }}
36 REDSHIFT_HOST={{ .Data.data.redshift_host }}
37 {{ end }}
38 {{ with secret "database/creds/etl-role" }}
39 DB_USERNAME={{ .Data.username }}
40 DB_PASSWORD={{ .Data.password }}
41 {{ end }}
42 EOT
43 destination = "secrets/env.txt"
44 env = true
45 }
46
47 restart {
48 attempts = 2
49 interval = "30m"
50 delay = "15s"
51 mode = "fail"
52 }
53 }
54 }
55}_____
Operations
Deploying Jobs
1# Plan the job (dry run)
2nomad job plan web-app.nomad.hcl
3
4# Deploy
5nomad job run web-app.nomad.hcl
6
7# Check status
8nomad job status web-app
9
10# View allocations
11nomad alloc status <alloc-id>
12
13# Stream logs
14nomad alloc logs -f <alloc-id> appRolling Updates
1# Update with new version
2nomad job run -var="version=2.0.0" web-app.nomad.hcl
3
4# Monitor deployment
5nomad job status web-app
6
7# Promote canary if healthy
8nomad deployment promote <deployment-id>
9
10# Rollback if needed
11nomad job revert web-app 1______
Nomad vs EKS: When to Choose What
Here's my honest take after running both:
Choose Nomad when:
- You have mixed workloads (containers + non-containerized apps)
- Your team is small and can't dedicate engineers to platform work
- You want simpler operations and fewer moving parts
- You're already invested in the HashiCorp ecosystem
- You need multi-datacenter federation
Choose EKS when:
- You need the vast Kubernetes ecosystem of tools
- Your team has Kubernetes expertise
- You want managed control plane (less operational burden)
- You're building on CNCF standards for portability
______
Wrapping Up
Nomad isn't trying to replace Kubernetes - it's an alternative for teams who want container orchestration without the complexity overhead. If you're already in the HashiCorp ecosystem, the integration with Vault and Consul makes it particularly attractive.
For mixed workloads, batch processing, or smaller teams without dedicated platform engineers, Nomad deserves serious consideration. It won't give you the vast Kubernetes ecosystem, but it will give you a scheduler that's easier to understand, operate, and troubleshoot.
Start with a dev cluster, deploy a few jobs, and see if the operational simplicity outweighs the ecosystem tradeoff for your use case. You might be surprised.
