DATAVILLAS
|
Contato
Voltar ao Blog
DagsterAWSTerraformData Engineering

Deploying Dagster on AWS: A Production-Ready Guide

15 de novembro de 20254 min de leitura

Deploying Dagster on AWS: A Production-Ready Guide

Dagster has become one of the most compelling choices for data orchestration — it brings software engineering rigor to data pipelines with its asset-centric model, rich type system, and excellent developer experience. But deploying it to production on AWS requires careful thought.

This guide walks through a battle-tested architecture for running Dagster on AWS, using ECS for compute, RDS for the event log, and Terraform for infrastructure-as-code.

Why Dagster?

Before diving into deployment, it's worth understanding what makes Dagster different from traditional orchestrators like Airflow:

  • Asset-centric thinking: You define data assets (tables, ML models, reports) rather than tasks. Dagster tracks lineage and freshness automatically.
  • Strong typing: Dagster's type system catches pipeline issues at definition time, not at 3am.
  • Unified observability: The Dagit UI gives you a real-time view of your entire data platform — assets, runs, schedules, sensors.

Architecture Overview

Our production setup uses:

┌─────────────────────────────────────────────┐
│                    AWS VPC                   │
│                                              │
│  ┌──────────┐    ┌───────────────────────┐  │
│  │  ALB     │───▶│  ECS Fargate          │  │
│  │ (HTTPS)  │    │  (Dagit + Daemon)     │  │
│  └──────────┘    └───────────────────────┘  │
│                           │                  │
│                  ┌────────▼────────┐         │
│                  │   RDS Postgres  │         │
│                  │  (Event Store)  │         │
│                  └─────────────────┘         │
│                                              │
│  ┌──────────────────────────────────────┐   │
│  │  ECS Run Launcher (auto-scaling)     │   │
│  └──────────────────────────────────────┘   │
└─────────────────────────────────────────────┘

Setting Up the Infrastructure with Terraform

VPC and Networking

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "dagster-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true  # Cost optimization for non-critical envs
}

RDS for the Event Store

Dagster stores all run history and event logs in PostgreSQL. Use RDS for managed, durable storage:

resource "aws_db_instance" "dagster" {
  identifier        = "dagster-event-store"
  engine            = "postgres"
  engine_version    = "15.4"
  instance_class    = "db.t3.small"
  allocated_storage = 20

  db_name  = "dagster"
  username = "dagster"
  password = var.db_password

  vpc_security_group_ids = [aws_security_group.rds.id]
  db_subnet_group_name   = aws_db_subnet_group.dagster.name

  backup_retention_period = 7
  skip_final_snapshot     = false
  deletion_protection     = true
}

ECS Fargate for the Dagit Webserver

resource "aws_ecs_service" "dagit" {
  name            = "dagit"
  cluster         = aws_ecs_cluster.dagster.id
  task_definition = aws_ecs_task_definition.dagit.arn
  desired_count   = 1
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = module.vpc.private_subnets
    security_groups  = [aws_security_group.dagit.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.dagit.arn
    container_name   = "dagit"
    container_port   = 3000
  }
}

Configuring Dagster for Production

The dagster.yaml workspace configuration for production:

storage:
  postgres:
    postgres_db:
      username: dagster
      password:
        env: DAGSTER_POSTGRES_PASSWORD
      hostname:
        env: DAGSTER_POSTGRES_HOST
      db_name: dagster
      port: 5432

run_launcher:
  module: dagster_aws.ecs
  class: EcsRunLauncher
  config:
    task_definition:
      env: DAGSTER_ECS_TASK_DEFINITION_ARN
    container_name: run

telemetry:
  enabled: false

Lessons Learned

1. Separate the daemon from Dagit. The Dagster daemon handles schedules and sensors. Run it as a separate ECS task — if it crashes, your schedules stop. Separate containers make debugging easier and allow independent scaling.

2. Use SSM Parameter Store for secrets. Never put secrets in environment variables directly in your task definitions. Use AWS SSM Parameter Store and reference them in Terraform.

3. Set memory limits carefully. ECS tasks have strict memory limits. Start with 2GB for Dagit and monitor with CloudWatch. Complex asset graphs with many ops can spike memory during graph traversal.

4. Enable S3 for I/O Manager. For any production Dagster deployment, configure the S3 I/O manager for intermediate storage. This makes your runs reproducible and allows you to debug individual steps.

from dagster_aws.s3 import s3_pickle_io_manager, s3_resource

@repository
def my_repo():
    return [
        *assets,
        define_asset_job("daily_job"),
    ]

defs = Definitions(
    assets=all_assets,
    resources={
        "io_manager": s3_pickle_io_manager.configured({
            "s3_bucket": "my-dagster-bucket",
            "s3_prefix": "dagster-io",
        }),
        "s3": s3_resource,
    },
)

Monitoring and Alerting

Set up CloudWatch alarms for:

  • ECS task stop events (both Dagit and Daemon)
  • RDS CPU and storage
  • ALB 5xx errors
  • Dagster run failures (via Dagster's built-in alerting or a custom sensor)

Cost Optimization

For a small team, this architecture runs comfortably on:

  • ECS Fargate: ~$30/month (Dagit + Daemon, 0.5 vCPU / 1GB each)
  • RDS t3.small: ~$25/month
  • ALB: ~$20/month
  • NAT Gateway: ~$35/month

Total: ~$110/month for a solid production Dagster deployment.

Next Steps

Once your Dagster deployment is stable, consider:

  • Adding a software-defined asset check pipeline to monitor data quality
  • Setting up Dagster Cloud Hybrid if you want Dagster to manage the orchestration layer while keeping compute on-prem
  • Exploring Dagster's dbt integration for seamless analytics engineering workflows

This architecture has been used in production deployments handling millions of daily events. If you have questions about adapting it to your use case, get in touch.

Edson Nogueira
Edson Nogueira
Founder & Principal Engineer

AWS Certified Solutions Architect and Dagster contributor. Get in touch.

Need help with your data infrastructure?

DataVillas builds production-grade data systems. Let's discuss your project.

Get in Touch →