Nomad on AWS: Container Orchestration Without Kubernetes Complexity

Let's get something out of the way: Kubernetes won the container orchestration wars. It's the default choice, the industry standard, the thing everyone puts on their resume.

But winning doesn't mean it's the right choice for every workload. Kubernetes brings enormous capability - and enormous complexity. For teams that don't need its massive feature set, or don't want to manage its operational overhead, HashiCorp Nomad offers a compelling alternative.

Nomad handles containers, VMs, Java apps, and batch jobs with a single scheduler. It's simpler to operate, easier to understand, and integrates natively with Vault and Consul.

Let me walk you through deploying a production-ready Nomad cluster on AWS with Terraform.

_____

Why Consider Nomad?

Simpler mental model - Nomad has one binary, one configuration language, and fewer concepts to learn. Jobs, tasks, groups - that covers most of what you need to know.
Multi-workload scheduling - Containers, raw binaries, Java JARs, batch jobs, and system services. One scheduler for everything. No need to containerize your legacy Java app just to run it.
Easier operations - No etcd cluster to manage. Raft consensus is built-in. Upgrades are straightforward. I've seen teams run Nomad in production with a fraction of the operational overhead of Kubernetes.
HashiCorp ecosystem integration - First-class Vault integration for secrets. Consul for service discovery and mesh. Terraform for provisioning. If you're already in the HashiCorp world, everything just works together.
Federation built-in - Multi-region, multi-datacenter deployments without bolt-on solutions.

______

When Kubernetes is the Better Choice

Let's be honest about when Nomad isn't the right fit:

You need the massive CNCF ecosystem of tools and integrations
Your team already has Kubernetes expertise
You're running on managed Kubernetes (EKS, GKE) and don't want to manage clusters yourself
You need advanced scheduling features like custom schedulers or complex operators

When Nomad Shines

Mixed workloads where containers live alongside legacy apps
Simpler infrastructure requirements
Teams without dedicated platform engineers
Edge computing and smaller deployments
Batch processing workloads

______

Architecture Overview

Servers run the scheduler and maintain cluster state via Raft consensus. Always run 3 or 5 for high availability.
Clients run the actual workloads. Scale based on demand via Auto Scaling Groups.
Consul provides service discovery and health checking. Optional but highly recommended.
Vault handles secrets management for jobs. Also optional but makes secrets so much easier.

______

Terraform Infrastructure

Project Structure

1nomad-cluster/
2├── modules/
3│   ├── nomad-servers/
4│   ├── nomad-clients/
5│   ├── consul/
6│   └── networking/
7├── environments/
8│   ├── prod/
9│   └── dev/
10└── packer/
11    ├── nomad-server.pkr.hcl
12    └── nomad-client.pkr.hcl

Nomad Server Module

1# modules/nomad-servers/main.tf
2data "aws_ami" "nomad_server" {
3  most_recent = true
4  owners = ["self"]
5
6  filter {
7    name = "name"
8    values = ["nomad-server-*"]
9  }
10
11  filter {
12    name = "tag:Component"
13    values = ["nomad-server"]
14  }
15}
16
17resource "aws_launch_template" "server" {
18  name_prefix = "${var.cluster_name}-server-"
19  image_id = data.aws_ami.nomad_server.id
20  instance_type = var.server_instance_type
21
22  iam_instance_profile {
23    name = aws_iam_instance_profile.server.name
24  }
25
26  network_interfaces {
27    associate_public_ip_address = false
28    security_groups = [aws_security_group.server.id]
29  }
30
31  user_data = base64encode(templatefile("${path.module}/templates/server-userdata.sh.tpl", {
32    cluster_name       = var.cluster_name
33    region             = var.region
34    datacenter         = var.datacenter
35    server_count       = var.server_count
36    consul_encrypt_key = var.consul_encrypt_key
37    vault_addr         = var.vault_addr
38  }))
39
40  tag_specifications {
41    resource_type = "instance"
42    tags = merge(var.common_tags, {
43      Name = "${var.cluster_name}-server"
44      Role = "nomad-server"
45    })
46  }
47
48  lifecycle {
49    create_before_destroy = true
50  }
51}
52
53resource "aws_autoscaling_group" "server" {
54  name = "${var.cluster_name}-server"
55  desired_capacity = var.server_count
56  min_size = var.server_count
57  max_size = var.server_count
58  vpc_zone_identifier = var.private_subnet_ids
59  target_group_arns = [aws_lb_target_group.server_http.arn]
60
61  launch_template {
62    id = aws_launch_template.server.id
63    version = "$Latest"
64  }
65
66  instance_refresh {
67    strategy = "Rolling"
68    preferences {
69      min_healthy_percentage = 66
70    }
71  }
72
73  tag {
74    key = "NomadAutoJoin"
75    value = var.cluster_name
76    propagate_at_launch = true
77  }
78}

Server Configuration Template

The user data script configures both Consul and Nomad:

1#!/bin/bash
2set -e
3
4INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
5PRIVATE_IP=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)
6
7# Configure Consul agent
8cat <<EOF > /etc/consul.d/consul.hcl
9datacenter = "${datacenter}"
10data_dir = "/opt/consul/data"
11log_level = "INFO"
12
13bind_addr = "$${PRIVATE_IP}"
14client_addr = "0.0.0.0"
15
16retry_join = ["provider=aws tag_key=NomadAutoJoin tag_value=${cluster_name}"]
17encrypt = "${consul_encrypt_key}"
18
19connect {
20 enabled = true
21}
22
23ports {
24 grpc = 8502
25}
26EOF
27
28systemctl enable consul
29systemctl start consul
30
31# Wait for Consul to be ready
32until consul members; do
33  sleep 2
34done
35
36# Configure Nomad server
37cat <<EOF > /etc/nomad.d/nomad.hcl
38datacenter = "${datacenter}"
39region = "${region}"
40data_dir = "/opt/nomad/data"
41
42bind_addr = "0.0.0.0"
43
44advertise {
45 http = "$${PRIVATE_IP}"
46 rpc = "$${PRIVATE_IP}"
47 serf = "$${PRIVATE_IP}"
48}
49
50server {
51 enabled = true
52 bootstrap_expect = ${server_count}
53
54 server_join {
55 retry_join = ["provider=aws tag_key=NomadAutoJoin tag_value=${cluster_name}"]
56 }
57
58 default_scheduler_config {
59 scheduler_algorithm = "spread"
60 
61 preemption_config {
62 batch_scheduler_enabled = true
63 system_scheduler_enabled = true
64 service_scheduler_enabled = true
65 }
66 }
67}
68
69consul {
70 address = "127.0.0.1:8500"
71}
72
73vault {
74 enabled = true
75 address = "${vault_addr}"
76}
77
78acl {
79 enabled = true
80}
81
82telemetry {
83 collection_interval = "10s"
84 disable_hostname = true
85 prometheus_metrics = true
86 publish_allocation_metrics = true
87 publish_node_metrics = true
88}
89EOF
90
91systemctl enable nomad
92systemctl start nomad

_____

Deploying Applications

Service Job Specification

Here's what a typical web application looks like in Nomad:

1# jobs/web-app.nomad.hcl
2job "web-app" {
3  datacenters = ["dc1"]
4  type = "service"
5
6  group "web" {
7    count = 3
8
9    spread {
10      attribute = "${attr.platform.aws.placement.availability-zone}"
11      weight = 100
12    }
13
14    network {
15      port "http" {
16        to = 8080
17      }
18    }
19
20    service {
21      name = "web-app"
22      port = "http"
23
24      tags = [
25        "traefik.enable=true",
26        "traefik.http.routers.web.rule=Host(`app.example.com`)"
27      ]
28
29      check {
30        type = "http"
31        path = "/health"
32        interval = "10s"
33        timeout = "2s"
34      }
35    }
36
37    task "app" {
38      driver = "docker"
39
40      config {
41        image = "myregistry.com/web-app:${NOMAD_META_version}"
42        ports = ["http"]
43      }
44
45      resources {
46        cpu = 500
47        memory = 512
48      }
49
50      env {
51        PORT = "${NOMAD_PORT_http}"
52        ENVIRONMENT = "production"
53      }
54
55      # Vault secrets integration - this is where Nomad shines
56      vault {
57        policies = ["web-app"]
58      }
59
60      template {
61        data = <<-EOT
62 {{ with secret "database/creds/web-app" }}
63 DB_USERNAME={{ .Data.username }}
64 DB_PASSWORD={{ .Data.password }}
65 {{ end }}
66 {{ with secret "secret/data/web-app/config" }}
67 API_KEY={{ .Data.data.api_key }}
68 {{ end }}
69 EOT
70        destination = "secrets/env.txt"
71        env = true
72      }
73    }
74
75    update {
76      max_parallel = 1
77      min_healthy_time = "30s"
78      healthy_deadline = "5m"
79      auto_revert = true
80      canary = 1
81    }
82  }
83}

Notice the Vault integration - Nomad templates pull secrets directly from Vault and inject them as environment variables. No sidecar containers needed.

Batch Job Example

Nomad handles batch jobs elegantly:

1job "data-processor" {
2  datacenters = ["dc1"]
3  type = "batch"
4
5  periodic {
6    cron = "0 2 * * *"  # Daily at 2 AM
7    prohibit_overlap = true
8    time_zone = "Asia/Kolkata"
9  }
10
11  group "processor" {
12    count = 1
13
14    task "etl" {
15      driver = "docker"
16
17      config {
18        image = "myregistry.com/etl-processor:latest"
19        command = "python"
20        args = ["run_etl.py", "--date", "${NOMAD_META_date}"]
21      }
22
23      resources {
24        cpu = 2000
25        memory = 4096
26      }
27
28      vault {
29        policies = ["etl-processor"]
30      }
31
32      template {
33        data = <<-EOT
34 {{ with secret "secret/data/etl/config" }}
35 S3_BUCKET={{ .Data.data.bucket }}
36 REDSHIFT_HOST={{ .Data.data.redshift_host }}
37 {{ end }}
38 {{ with secret "database/creds/etl-role" }}
39 DB_USERNAME={{ .Data.username }}
40 DB_PASSWORD={{ .Data.password }}
41 {{ end }}
42 EOT
43        destination = "secrets/env.txt"
44        env = true
45      }
46
47      restart {
48        attempts = 2
49        interval = "30m"
50        delay = "15s"
51        mode = "fail"
52      }
53    }
54  }
55}

_____

Operations

Deploying Jobs

1# Plan the job (dry run)
2nomad job plan web-app.nomad.hcl
3
4# Deploy
5nomad job run web-app.nomad.hcl
6
7# Check status
8nomad job status web-app
9
10# View allocations
11nomad alloc status <alloc-id>
12
13# Stream logs
14nomad alloc logs -f <alloc-id> app

Rolling Updates

1# Update with new version
2nomad job run -var="version=2.0.0" web-app.nomad.hcl
3
4# Monitor deployment
5nomad job status web-app
6
7# Promote canary if healthy
8nomad deployment promote <deployment-id>
9
10# Rollback if needed
11nomad job revert web-app 1

______

Nomad vs EKS: When to Choose What

Here's my honest take after running both:

Choose Nomad when:

You have mixed workloads (containers + non-containerized apps)
Your team is small and can't dedicate engineers to platform work
You want simpler operations and fewer moving parts
You're already invested in the HashiCorp ecosystem
You need multi-datacenter federation

Choose EKS when:

You need the vast Kubernetes ecosystem of tools
Your team has Kubernetes expertise
You want managed control plane (less operational burden)
You're building on CNCF standards for portability

______

Wrapping Up

Nomad isn't trying to replace Kubernetes - it's an alternative for teams who want container orchestration without the complexity overhead. If you're already in the HashiCorp ecosystem, the integration with Vault and Consul makes it particularly attractive.

For mixed workloads, batch processing, or smaller teams without dedicated platform engineers, Nomad deserves serious consideration. It won't give you the vast Kubernetes ecosystem, but it will give you a scheduler that's easier to understand, operate, and troubleshoot.

Start with a dev cluster, deploy a few jobs, and see if the operational simplicity outweighs the ecosystem tradeoff for your use case. You might be surprised.

Nomad on AWS: Container Orchestration Without Kubernetes Complexity

Why Consider Nomad?

When Kubernetes is the Better Choice

When Nomad Shines

Architecture Overview

Terraform Infrastructure

Project Structure

Nomad Server Module

Server Configuration Template

Deploying Applications

Service Job Specification

Batch Job Example

Operations

Deploying Jobs

Rolling Updates

Nomad vs EKS: When to Choose What

Wrapping Up

Want to learn more?

Hashicorp Terraform Associate (004) Study Notes

HashiCorp Terraform Associate (004) Practice Exam Sets