Kitsu AWS Infrastructure¶

Terraform configuration to deploy Kitsu (production management for animation/VFX studios) on AWS.

Architecture¶

                    Internet
                        │
                ┌───────▼───────┐
                │  Application  │
                │ Load Balancer │
                │  (HTTPS:443)  │
                └───────┬───────┘
                        │
                ┌───────▼───────┐
                │ EC2 (public)  │
                │  t3.medium    │
                │ ┌───────────┐ │
                │ │  Docker   │ │
                │ │ cgwire/   │ │
                │ │ cgwire    │ │
                │ └─────┬─────┘ │
                │       │       │
                │ ┌─────▼─────┐ │
                │ │EBS Volume │ │
                │ │ (50GB)    │ │
                │ └───────────┘ │
                └───────┬───────┘
                        │
                ┌───────▼───────┐
                │   S3 Bucket   │
                │  (Previews)   │
                └───────────────┘

Component	AWS Service	Details
Compute	EC2 t3.medium	Ubuntu 22.04, public subnet (SG-protected)
Container	Docker	cgwire/cgwire all-in-one image
Storage	EBS gp3	50GB data volume (Docker data-root)
Previews	S3	Zou preview files (`FS_BACKEND=s3`)
Load Balancer	ALB	HTTPS termination, HTTP→HTTPS redirect
Certificate	ACM	DNS validation (manual)
Backups	AWS Backup	Daily snapshots, 7-day retention
Access	SSM Session Manager	No SSH required
Monitoring	New Relic	Infrastructure agent + log forwarding

Prerequisites¶

AWS CLI configured with appropriate credentials
Terraform >= 1.5.0
Existing VPC with at least 2 public subnets in different AZs
Domain name with DNS access (for ACM validation)

Deployment¶

1. Create secrets in SSM Parameter Store¶

aws ssm put-parameter \
  --name "/kitsu/secret_key" \
  --value "$(openssl rand -hex 32)" \
  --type SecureString

aws ssm put-parameter \
  --name "/kitsu/newrelic_ingest_license_key" \
  --value "YOUR_NEWRELIC_INGEST_LICENSE_KEY" \
  --type SecureString

The New Relic ingest license key can be found in New Relic under Account settings > API keys > INGEST - LICENSE.

2. Configure variables¶

cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your values

Required variables:

vpc_id - Your VPC ID
public_subnet_ids - At least 2 public subnets in different AZs for ALB
ec2_subnet_id - Subnet for EC2 (must share an AZ with one of the ALB subnets)
domain_name - Domain for HTTPS (e.g., kitsu.example.com)
s3_bucket_prefix - Prefix for S3 preview buckets (e.g., kitsu-ACCOUNT_ID-)

3. Deploy infrastructure¶

terraform init
terraform apply

The first apply will create the ACM certificate but the HTTPS listener will fail until you validate it.

4. Validate ACM certificate¶

Get the DNS validation record:

terraform state show aws_acm_certificate.kitsu

Add a CNAME record to your DNS provider:

Name: _xxxxx.kitsu (the part before your domain)
Value: _xxxxx.acm-validations.aws.
Proxy: Disabled (if using Cloudflare)

Check validation status:

aws acm describe-certificate \
  --certificate-arn "$(terraform output -raw acm_certificate_arn)" \
  --query 'Certificate.Status' \
  --output text

Once status is ISSUED, re-run:

terraform apply

5. Configure DNS¶

Point your domain to the ALB:

terraform output alb_dns_name

Add a CNAME record: kitsu.example.com → kitsu-alb-xxxxx.us-east-1.elb.amazonaws.com

6. Access Kitsu¶

Navigate to https://kitsu.example.com
Login with default credentials: admin@example.com / mysecretpassword
Change the admin password immediately

Files¶

File	Purpose
`terraform/main.tf`	Provider, backend, data sources
`terraform/variables.tf`	Input variable definitions
`terraform/outputs.tf`	Output values
`terraform/ec2.tf`	EC2 instance and EBS volumes
`terraform/s3.tf`	S3 buckets for preview storage
`terraform/alb.tf`	Load balancer, listeners, target group
`terraform/security_groups.tf`	Network security rules
`terraform/iam.tf`	IAM role for EC2 (SSM + secrets + S3)
`terraform/acm.tf`	ACM certificate
`terraform/backup.tf`	AWS Backup vault and plan
`terraform/user_data.sh`	EC2 bootstrap script
`terraform/terraform.tfvars`	Production variable values
`terraform/staging.tfvars`	Staging variable values
`scripts/deploy-config-update.sh`	Live config update (no instance replace)
`scripts/migrate-previews-to-s3.sh`	Migrate previews from EBS to S3

Operations¶

Connect to EC2¶

aws ssm start-session --target "$(terraform output -raw ec2_instance_id)"

View container logs¶

# After connecting via SSM
sudo docker logs kitsu

# Application-level logs (also shipped to New Relic)
sudo cat /var/log/kitsu/zou/gunicorn_error.log
sudo cat /var/log/kitsu/nginx/error.log
sudo cat /var/log/kitsu/postgresql/postgresql-*.log

Inspect logs via SSM Session Manager (internal)¶

# Set environment tag once: kitsu (prod) or kitsu-staging (staging)
PROJECT_NAME="kitsu-staging"
AWS_REGION="us-east-1"

# Resolve running instance by Project tag
INSTANCE_ID=$(aws ec2 describe-instances \
  --region "$AWS_REGION" \
  --filters "Name=tag:Project,Values=${PROJECT_NAME}" "Name=instance-state-name,Values=running" \
  --query 'Reservations[].Instances[].InstanceId' \
  --output text)
echo "$INSTANCE_ID"

# Option A: Open interactive shell on instance
aws ssm start-session --region "$AWS_REGION" --target "$INSTANCE_ID"

# Then run on the instance:
sudo docker ps
sudo docker logs --tail 200 kitsu
sudo docker logs -f kitsu
sudo tail -n 200 /var/log/kitsu/zou/gunicorn_error.log
sudo tail -n 200 /var/log/kitsu/nginx/error.log
sudo ls -1 /var/log/kitsu/postgresql/
sudo tail -n 200 /var/log/kitsu/postgresql/postgresql-*.log
sudo journalctl -u amazon-ssm-agent -n 100 --no-pager
sudo systemctl status kitsu-compose --no-pager

# Option B: Non-interactive one-shot log fetch via SSM Run Command
CMD_ID=$(aws ssm send-command \
  --region "$AWS_REGION" \
  --instance-ids "$INSTANCE_ID" \
  --document-name "AWS-RunShellScript" \
  --comment "Kitsu quick log check" \
  --parameters commands='[
    "sudo docker logs --tail 200 kitsu",
    "echo \"--- zou gunicorn ---\"",
    "sudo tail -n 200 /var/log/kitsu/zou/gunicorn_error.log",
    "echo \"--- nginx ---\"",
    "sudo tail -n 200 /var/log/kitsu/nginx/error.log",
    "echo \"--- postgresql ---\"",
    "sudo tail -n 200 /var/log/kitsu/postgresql/postgresql-*.log"
  ]' \
  --query 'Command.CommandId' \
  --output text)

aws ssm get-command-invocation \
  --region "$AWS_REGION" \
  --command-id "$CMD_ID" \
  --instance-id "$INSTANCE_ID" \
  --query '[Status,StandardOutputContent,StandardErrorContent]' \
  --output text

Restart Kitsu¶

# After connecting via SSM
sudo systemctl restart kitsu-compose

Deploy config changes without replacing the instance¶

Terraform's user_data.sh only runs on instance creation. To update configuration (Nginx, PostgreSQL, New Relic, docker-compose) on a running instance without downtime:

# Option 1: Copy script via SSM and run it
INSTANCE_ID=$(aws ec2 describe-instances \
  --filters "Name=tag:Project,Values=kitsu" "Name=instance-state-name,Values=running" \
  --query 'Reservations[].Instances[].InstanceId' --output text)

aws ssm send-command \
  --instance-ids "$INSTANCE_ID" \
  --document-name "AWS-RunShellScript" \
  --parameters "commands=[\"$(cat infra/scripts/deploy-config-update.sh)\"]" \
  --output text --query 'Command.CommandId'

# Option 2: SSM in and run manually
aws ssm start-session --target "$INSTANCE_ID"
# Then: sudo bash  (paste script contents or upload the file)

The deploy script is at scripts/deploy-config-update.sh. Update it when you change configuration, and keep it in sync with terraform/user_data.sh.

Data locations on EC2¶

/opt/kitsu/docker-compose.yml - Docker Compose configuration
/opt/kitsu/.env - Environment variables (SECRET_KEY, NEWRELIC_LICENSE_KEY)
/opt/kitsu/nginx/ - Nginx config overrides
/opt/kitsu/postgresql/ - PostgreSQL config overrides
/opt/kitsu/newrelic/ - New Relic log forwarding config
/opt/kitsu-data/docker/ - Docker data-root (images, named volumes)
/opt/kitsu-data/docker/volumes/ - PostgreSQL data (named volumes)
/var/log/kitsu/ - Application logs (Zou, Nginx, PostgreSQL) mounted from container

Data Persistence¶

Docker's data-root is on the EBS volume (/opt/kitsu-data/docker)
PostgreSQL uses Docker named volumes, stored under the data-root
Preview files are stored in S3 (FS_BACKEND=s3), not on the EBS volume
EBS data volume is separate from the EC2 instance
Volume persists if instance is terminated (standalone aws_ebs_volume resource)
Daily backups at 3 AM UTC via AWS Backup
7-day retention (configurable via backup_retention_days)

Cost Estimate (us-east-1)¶

Resource	Monthly
EC2 t3.medium	~$30
EBS 70GB gp3	~$6
ALB	~$22
S3	~$1
AWS Backup	~$4
Total	~$63

Staging Environment¶

A separate staging environment runs alongside prod using the same Terraform config with a different tfvars and state file. Staging shares the VPC, subnets, and SSM parameters with prod but has its own EC2, ALB, S3 bucket, and other resources (namespaced via project_name = "kitsu-staging").

Deploy staging¶

cd infra/terraform
terraform plan -var-file=staging.tfvars -state=staging.tfstate
terraform apply -var-file=staging.tfvars -state=staging.tfstate

After the first apply, validate the ACM certificate for kitsu-staging.scenarix.ai (same process as prod — add the CNAME record, wait for ISSUED, re-apply).

Key differences from prod¶

Setting	Prod	Staging
`project_name`	kitsu	kitsu-staging
`domain_name`	kitsu.scenarix.ai	kitsu-staging.scenarix.ai
`instance_type`	t3.medium	t3.small
`root_volume_size_gb`	20	10
`data_volume_size_gb`	50	10
`s3_bucket_prefix`	kitsu-408921634255-	kitsu-staging-

Destroy staging¶

terraform destroy -var-file=staging.tfvars -state=staging.tfstate

Notes¶

Why EC2 instead of Fargate?¶

The cgwire/cgwire image is an all-in-one container (PostgreSQL, Redis, Nginx, Zou) that requires persistent disk storage. EC2 + EBS is simpler and cheaper for this use case than Fargate + EFS.

SECRET_KEY configuration¶

The cgwire image uses supervisord which overrides DB_PASSWORD and DB_USERNAME for the Zou process, but SECRET_KEY is inherited from the container environment. This is used for auth token encryption.

Availability Zone requirements¶

Important: The EC2 subnet must be in the same AZ as one of the ALB subnets, otherwise the ALB cannot route traffic to the instance (you'll see Target.NotInUse errors).

The EBS data volume is created in the same AZ as the EC2 instance. Both are in a public subnet protected by security groups (only port 80 from ALB SG allowed inbound).

When selecting subnets:

public_subnet_ids - At least 2 subnets in different AZs (ALB requirement)
ec2_subnet_id - Must be in the same AZ as one of the public subnets

Example valid configuration:

public_subnet_ids = ["subnet-aaa (us-east-1a)", "subnet-bbb (us-east-1c)"]
ec2_subnet_id     = "subnet-ccc (us-east-1c)"  # Same AZ as subnet-bbb

Monitoring¶

Error logs from Zou, Nginx, and PostgreSQL are shipped to New Relic via the infrastructure agent running as a Docker sidecar.

Logs available in New Relic (query with NRQL):

-- All Kitsu errors
SELECT * FROM Log WHERE service = 'kitsu'

-- Zou API errors only
SELECT * FROM Log WHERE service = 'kitsu' AND component = 'zou-api'

-- PostgreSQL errors
SELECT * FROM Log WHERE service = 'kitsu' AND component = 'postgresql'

Host metrics (CPU, memory, disk) are also reported by the infrastructure agent.

Future improvements¶

Separate containers with RDS/ElastiCache for larger scale
New Relic alerting policies for error rate / CPU spikes