Building AI-Ready Infrastructure: Terraform Modules for GraphRAG on AWS

RAG (Retrieval-Augmented Generation) has become the go-to pattern for grounding LLMs with enterprise data. Ask a question, retrieve relevant documents, generate an answer based on those documents. It works remarkably well for many use cases.

But traditional vector-only RAG has limitations. It struggles with relationships between entities, loses context across document boundaries, and can't handle multi-hop reasoning where answering one question requires answering several intermediate questions first.

Enter GraphRAG: combining knowledge graphs with vector search for smarter retrieval. Instead of just finding similar text, you can traverse relationships. "Who reported to the CEO in Q3?" becomes answerable because you have entity relationships, not just document embeddings.

The infrastructure for GraphRAG is more complex than basic RAG - you need both a graph database and a vector store. But Terraform makes it manageable and repeatable.

Let me walk you through building production-ready GraphRAG infrastructure on AWS.

______

GraphRAG Architecture Overview

Neptune stores entities and relationships as a knowledge graph. Think of it as the "who knows whom" and "what relates to what" layer.
OpenSearch handles vector embeddings for semantic search. This is the "find similar content" layer.

Together, they enable hybrid retrieval - combining graph traversal with similarity search. The graph tells you what's connected; the vectors tell you what's similar.

______

Module Structure

1modules/
2├── graphrag/
3│   ├── main.tf
4│   ├── variables.tf
5│   ├── outputs.tf
6│   ├── neptune.tf
7│   ├── opensearch.tf
8│   ├── s3.tf
9│   ├── lambda.tf
10│   └── iam.tf

______

Core Infrastructure Module

Variables Definition

1# modules/graphrag/variables.tf
2variable "environment" {
3  description = "Environment name (dev, staging, prod)"
4  type = string
5}
6
7variable "project_name" {
8  description = "Project identifier"
9  type = string
10}
11
12variable "vpc_id" {
13  description = "VPC ID for deploying resources"
14  type = string
15}
16
17variable "private_subnet_ids" {
18  description = "Private subnet IDs"
19  type = list(string)
20}
21
22variable "neptune_instance_class" {
23  description = "Neptune instance type"
24  type = string
25  default = "db.r5.large"
26}
27
28variable "opensearch_instance_type" {
29  description = "OpenSearch instance type"
30  type = string
31  default = "r6g.large.search"
32}
33
34variable "opensearch_volume_size" {
35  description = "OpenSearch EBS volume size in GB"
36  type = number
37  default = 100
38}

Neptune Graph Database

1# modules/graphrag/neptune.tf
2resource "aws_neptune_cluster" "graph" {
3  cluster_identifier = "${var.project_name}-${var.environment}"
4  engine = "neptune"
5  engine_version = "1.3.1.0"
6  backup_retention_period = 7
7  preferred_backup_window = "02:00-03:00"
8  skip_final_snapshot = var.environment != "prod"
9  iam_database_authentication_enabled = true
10  storage_encrypted = true
11  kms_key_arn = aws_kms_key.graphrag.arn
12
13  vpc_security_group_ids = [aws_security_group.neptune.id]
14  neptune_subnet_group_name = aws_neptune_subnet_group.main.name
15
16  tags = local.common_tags
17}
18
19resource "aws_neptune_cluster_instance" "graph" {
20  count = var.environment == "prod" ? 2 : 1
21  cluster_identifier = aws_neptune_cluster.graph.id
22  instance_class = var.neptune_instance_class
23  engine = "neptune"
24
25  tags = local.common_tags
26}
27
28resource "aws_neptune_subnet_group" "main" {
29  name = "${var.project_name}-${var.environment}"
30  subnet_ids = var.private_subnet_ids
31
32  tags = local.common_tags
33}
34
35resource "aws_security_group" "neptune" {
36  name = "${var.project_name}-neptune-${var.environment}"
37  description = "Security group for Neptune cluster"
38  vpc_id = var.vpc_id
39
40  ingress {
41    description = "Neptune from application"
42    from_port = 8182
43    to_port = 8182
44    protocol = "tcp"
45    security_groups = [aws_security_group.application.id]
46  }
47
48  egress {
49    from_port = 0
50    to_port = 0
51    protocol = "-1"
52    cidr_blocks = ["0.0.0.0/0"]
53  }
54
55  tags = local.common_tags
56}

OpenSearch Vector Store

1# modules/graphrag/opensearch.tf
2resource "aws_opensearch_domain" "vectors" {
3  domain_name = "${var.project_name}-${var.environment}"
4  engine_version = "OpenSearch_2.11"
5
6  cluster_config {
7    instance_type = var.opensearch_instance_type
8    instance_count = var.environment == "prod" ? 3 : 1
9    zone_awareness_enabled = var.environment == "prod"
10
11    dynamic "zone_awareness_config" {
12      for_each = var.environment == "prod" ? [1] : []
13      content {
14        availability_zone_count = 3
15      }
16    }
17  }
18
19  ebs_options {
20    ebs_enabled = true
21    volume_size = var.opensearch_volume_size
22    volume_type = "gp3"
23  }
24
25  encrypt_at_rest {
26    enabled = true
27    kms_key_id = aws_kms_key.graphrag.key_id
28  }
29
30  node_to_node_encryption {
31    enabled = true
32  }
33
34  vpc_options {
35    subnet_ids = var.environment == "prod" ? var.private_subnet_ids : [var.private_subnet_ids[0]]
36    security_group_ids = [aws_security_group.opensearch.id]
37  }
38
39  advanced_security_options {
40    enabled = true
41    internal_user_database_enabled = false
42    master_user_options {
43      master_user_arn = aws_iam_role.opensearch_master.arn
44    }
45  }
46
47  domain_endpoint_options {
48    enforce_https = true
49    tls_security_policy = "Policy-Min-TLS-1-2-2019-07"
50  }
51
52  tags = local.common_tags
53}
54
55resource "aws_security_group" "opensearch" {
56  name = "${var.project_name}-opensearch-${var.environment}"
57  description = "Security group for OpenSearch domain"
58  vpc_id = var.vpc_id
59
60  ingress {
61    description = "HTTPS from application"
62    from_port = 443
63    to_port = 443
64    protocol = "tcp"
65    security_groups = [aws_security_group.application.id]
66  }
67
68  egress {
69    from_port = 0
70    to_port = 0
71    protocol = "-1"
72    cidr_blocks = ["0.0.0.0/0"]
73  }
74
75  tags = local.common_tags
76}

Document Storage and Processing Trigger

1# modules/graphrag/s3.tf
2resource "aws_s3_bucket" "documents" {
3  bucket = "${var.project_name}-documents-${var.environment}-${data.aws_caller_identity.current.account_id}"
4
5  tags = local.common_tags
6}
7
8resource "aws_s3_bucket_versioning" "documents" {
9  bucket = aws_s3_bucket.documents.id
10  versioning_configuration {
11    status = "Enabled"
12  }
13}
14
15resource "aws_s3_bucket_server_side_encryption_configuration" "documents" {
16  bucket = aws_s3_bucket.documents.id
17
18  rule {
19    apply_server_side_encryption_by_default {
20      kms_master_key_id = aws_kms_key.graphrag.arn
21      sse_algorithm = "aws:kms"
22    }
23  }
24}
25
26# Trigger Lambda when documents are uploaded
27resource "aws_s3_bucket_notification" "document_upload" {
28  bucket = aws_s3_bucket.documents.id
29
30  lambda_function {
31    lambda_function_arn = aws_lambda_function.document_processor.arn
32    events = ["s3:ObjectCreated:*"]
33    filter_prefix = "uploads/"
34    filter_suffix = ".pdf"
35  }
36
37  depends_on = [aws_lambda_permission.s3_invoke]
38}

Document Processing Lambda

1# modules/graphrag/lambda.tf
2resource "aws_lambda_function" "document_processor" {
3  function_name = "${var.project_name}-doc-processor-${var.environment}"
4  role = aws_iam_role.lambda_processor.arn
5  handler = "handler.process_document"
6  runtime = "python3.11"
7  timeout = 300
8  memory_size = 1024
9
10  filename = data.archive_file.lambda_package.output_path
11  source_code_hash = data.archive_file.lambda_package.output_base64sha256
12
13  vpc_config {
14    subnet_ids = var.private_subnet_ids
15    security_group_ids = [aws_security_group.application.id]
16  }
17
18  environment {
19    variables = {
20      NEPTUNE_ENDPOINT    = aws_neptune_cluster.graph.endpoint
21      OPENSEARCH_ENDPOINT = aws_opensearch_domain.vectors.endpoint
22      ENVIRONMENT         = var.environment
23    }
24  }
25
26  tags = local.common_tags
27}
28
29resource "aws_lambda_permission" "s3_invoke" {
30  statement_id = "AllowS3Invoke"
31  action = "lambda:InvokeFunction"
32  function_name = aws_lambda_function.document_processor.function_name
33  principal = "s3.amazonaws.com"
34  source_arn = aws_s3_bucket.documents.arn
35}

Module Outputs

1# modules/graphrag/outputs.tf
2output "neptune_endpoint" {
3  description = "Neptune cluster endpoint"
4  value = aws_neptune_cluster.graph.endpoint
5}
6
7output "neptune_reader_endpoint" {
8  description = "Neptune cluster reader endpoint"
9  value = aws_neptune_cluster.graph.reader_endpoint
10}
11
12output "opensearch_endpoint" {
13  description = "OpenSearch domain endpoint"
14  value = aws_opensearch_domain.vectors.endpoint
15}
16
17output "documents_bucket" {
18  description = "S3 bucket for document uploads"
19  value = aws_s3_bucket.documents.id
20}
21
22output "application_security_group_id" {
23  description = "Security group ID for applications needing GraphRAG access"
24  value = aws_security_group.application.id
25}

______

Using the Module

1# environments/dev/main.tf
2module "graphrag" {
3  source = "../../modules/graphrag"
4
5  environment = "dev"
6  project_name = "enterprise-search"
7  vpc_id = module.vpc.vpc_id
8  private_subnet_ids = module.vpc.private_subnets
9
10  neptune_instance_class = "db.r5.large"
11  opensearch_instance_type = "r6g.large.search"
12  opensearch_volume_size = 50
13}

______

Cost Optimization Tips

Dev/Test environments - Use smaller instance types and single-AZ deployments. The module handles this via the environment variable.
Neptune Serverless - For variable workloads, consider Neptune Serverless instead of provisioned instances.
OpenSearch UltraWarm - For older vector data that's queried less frequently, enable UltraWarm storage tier.
S3 Lifecycle policies - Archive processed documents to Glacier after 90 days.

______

Wrapping Up

GraphRAG infrastructure is inherently complex - you're running multiple databases, managing networking between them, handling document processing, and ensuring security across all components. Terraform modules bring sanity to this complexity by giving you repeatable, version-controlled deployments.

Start with dev, validate your retrieval patterns work as expected, then scale to production with confidence knowing the infrastructure is identical. The module approach means you can iterate on improvements and roll them out consistently across environments.

AI infrastructure is evolving rapidly. Having your foundation in Terraform means you can adapt as new services and patterns emerge - without rebuilding from scratch each time.

Building AI-Ready Infrastructure: Terraform Modules for GraphRAG on AWS

GraphRAG Architecture Overview

Module Structure

Core Infrastructure Module

Variables Definition

Neptune Graph Database

OpenSearch Vector Store

Document Storage and Processing Trigger

Document Processing Lambda

Module Outputs

Using the Module

Cost Optimization Tips

Wrapping Up

Want to learn more?

Hashicorp Terraform Associate (004) Study Notes

HashiCorp Terraform Associate (004) Practice Exam Sets