Kitsu AWS Infrastructure¶
Terraform configuration to deploy Kitsu (production management for animation/VFX studios) on AWS.
Architecture¶
Internet
│
┌───────▼───────┐
│ Application │
│ Load Balancer │
│ (HTTPS:443) │
└───────┬───────┘
│
┌───────▼───────┐
│ EC2 (public) │
│ t3.medium │
│ ┌───────────┐ │
│ │ Docker │ │
│ │ cgwire/ │ │
│ │ cgwire │ │
│ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │EBS Volume │ │
│ │ (50GB) │ │
│ └───────────┘ │
└───────┬───────┘
│
┌───────▼───────┐
│ S3 Bucket │
│ (Previews) │
└───────────────┘
| Component | AWS Service | Details |
|---|---|---|
| Compute | EC2 t3.medium | Ubuntu 22.04, public subnet (SG-protected) |
| Container | Docker | cgwire/cgwire all-in-one image |
| Storage | EBS gp3 | 50GB data volume (Docker data-root) |
| Previews | S3 | Zou preview files (FS_BACKEND=s3) |
| Load Balancer | ALB | HTTPS termination, HTTP→HTTPS redirect |
| Certificate | ACM | DNS validation (manual) |
| Backups | AWS Backup | Daily snapshots, 7-day retention |
| Access | SSM Session Manager | No SSH required |
| Monitoring | New Relic | Infrastructure agent + log forwarding |
Prerequisites¶
- AWS CLI configured with appropriate credentials
- Terraform >= 1.5.0
- Existing VPC with at least 2 public subnets in different AZs
- Domain name with DNS access (for ACM validation)
Deployment¶
1. Create secrets in SSM Parameter Store¶
aws ssm put-parameter \
--name "/kitsu/secret_key" \
--value "$(openssl rand -hex 32)" \
--type SecureString
aws ssm put-parameter \
--name "/kitsu/newrelic_ingest_license_key" \
--value "YOUR_NEWRELIC_INGEST_LICENSE_KEY" \
--type SecureString
The New Relic ingest license key can be found in New Relic under Account settings > API keys > INGEST - LICENSE.
2. Configure variables¶
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your values
Required variables:
vpc_id- Your VPC IDpublic_subnet_ids- At least 2 public subnets in different AZs for ALBec2_subnet_id- Subnet for EC2 (must share an AZ with one of the ALB subnets)domain_name- Domain for HTTPS (e.g.,kitsu.example.com)s3_bucket_prefix- Prefix for S3 preview buckets (e.g.,kitsu-ACCOUNT_ID-)
3. Deploy infrastructure¶
terraform init
terraform apply
The first apply will create the ACM certificate but the HTTPS listener will fail until you validate it.
4. Validate ACM certificate¶
Get the DNS validation record:
terraform state show aws_acm_certificate.kitsu
Add a CNAME record to your DNS provider:
- Name:
_xxxxx.kitsu(the part before your domain) - Value:
_xxxxx.acm-validations.aws. - Proxy: Disabled (if using Cloudflare)
Check validation status:
aws acm describe-certificate \
--certificate-arn "$(terraform output -raw acm_certificate_arn)" \
--query 'Certificate.Status' \
--output text
Once status is ISSUED, re-run:
terraform apply
5. Configure DNS¶
Point your domain to the ALB:
terraform output alb_dns_name
Add a CNAME record: kitsu.example.com → kitsu-alb-xxxxx.us-east-1.elb.amazonaws.com
6. Access Kitsu¶
- Navigate to
https://kitsu.example.com - Login with default credentials:
admin@example.com/mysecretpassword - Change the admin password immediately
Files¶
| File | Purpose |
|---|---|
terraform/main.tf |
Provider, backend, data sources |
terraform/variables.tf |
Input variable definitions |
terraform/outputs.tf |
Output values |
terraform/ec2.tf |
EC2 instance and EBS volumes |
terraform/s3.tf |
S3 buckets for preview storage |
terraform/alb.tf |
Load balancer, listeners, target group |
terraform/security_groups.tf |
Network security rules |
terraform/iam.tf |
IAM role for EC2 (SSM + secrets + S3) |
terraform/acm.tf |
ACM certificate |
terraform/backup.tf |
AWS Backup vault and plan |
terraform/user_data.sh |
EC2 bootstrap script |
terraform/terraform.tfvars |
Production variable values |
terraform/staging.tfvars |
Staging variable values |
scripts/deploy-config-update.sh |
Live config update (no instance replace) |
scripts/migrate-previews-to-s3.sh |
Migrate previews from EBS to S3 |
Operations¶
Connect to EC2¶
aws ssm start-session --target "$(terraform output -raw ec2_instance_id)"
View container logs¶
# After connecting via SSM
sudo docker logs kitsu
# Application-level logs (also shipped to New Relic)
sudo cat /var/log/kitsu/zou/gunicorn_error.log
sudo cat /var/log/kitsu/nginx/error.log
sudo cat /var/log/kitsu/postgresql/postgresql-*.log
Restart Kitsu¶
# After connecting via SSM
sudo systemctl restart kitsu-compose
Deploy config changes without replacing the instance¶
Terraform's user_data.sh only runs on instance creation. To update configuration
(Nginx, PostgreSQL, New Relic, docker-compose) on a running instance without downtime:
# Option 1: Copy script via SSM and run it
INSTANCE_ID=$(aws ec2 describe-instances \
--filters "Name=tag:Project,Values=kitsu" "Name=instance-state-name,Values=running" \
--query 'Reservations[].Instances[].InstanceId' --output text)
aws ssm send-command \
--instance-ids "$INSTANCE_ID" \
--document-name "AWS-RunShellScript" \
--parameters "commands=[\"$(cat infra/scripts/deploy-config-update.sh)\"]" \
--output text --query 'Command.CommandId'
# Option 2: SSM in and run manually
aws ssm start-session --target "$INSTANCE_ID"
# Then: sudo bash (paste script contents or upload the file)
The deploy script is at scripts/deploy-config-update.sh. Update it when you change
configuration, and keep it in sync with terraform/user_data.sh.
Data locations on EC2¶
/opt/kitsu/docker-compose.yml- Docker Compose configuration/opt/kitsu/.env- Environment variables (SECRET_KEY, NEWRELIC_LICENSE_KEY)/opt/kitsu/nginx/- Nginx config overrides/opt/kitsu/postgresql/- PostgreSQL config overrides/opt/kitsu/newrelic/- New Relic log forwarding config/opt/kitsu-data/docker/- Docker data-root (images, named volumes)/opt/kitsu-data/docker/volumes/- PostgreSQL data (named volumes)/var/log/kitsu/- Application logs (Zou, Nginx, PostgreSQL) mounted from container
Data Persistence¶
- Docker's data-root is on the EBS volume (
/opt/kitsu-data/docker) - PostgreSQL uses Docker named volumes, stored under the data-root
- Preview files are stored in S3 (
FS_BACKEND=s3), not on the EBS volume - EBS data volume is separate from the EC2 instance
- Volume persists if instance is terminated (standalone
aws_ebs_volumeresource) - Daily backups at 3 AM UTC via AWS Backup
- 7-day retention (configurable via
backup_retention_days)
Cost Estimate (us-east-1)¶
| Resource | Monthly |
|---|---|
| EC2 t3.medium | ~$30 |
| EBS 70GB gp3 | ~$6 |
| ALB | ~$22 |
| S3 | ~$1 |
| AWS Backup | ~$4 |
| Total | ~$63 |
Staging Environment¶
A separate staging environment runs alongside prod using the same Terraform config with a different tfvars and state file. Staging shares the VPC, subnets, and SSM parameters with prod but has its own EC2, ALB, S3 bucket, and other resources (namespaced via project_name = "kitsu-staging").
Deploy staging¶
cd infra/terraform
terraform plan -var-file=staging.tfvars -state=staging.tfstate
terraform apply -var-file=staging.tfvars -state=staging.tfstate
After the first apply, validate the ACM certificate for kitsu-staging.scenarix.ai (same process as prod — add the CNAME record, wait for ISSUED, re-apply).
Key differences from prod¶
| Setting | Prod | Staging |
|---|---|---|
project_name |
kitsu | kitsu-staging |
domain_name |
kitsu.scenarix.ai | kitsu-staging.scenarix.ai |
instance_type |
t3.medium | t3.small |
root_volume_size_gb |
20 | 10 |
data_volume_size_gb |
50 | 10 |
s3_bucket_prefix |
kitsu-408921634255- | kitsu-staging- |
Destroy staging¶
terraform destroy -var-file=staging.tfvars -state=staging.tfstate
Notes¶
Why EC2 instead of Fargate?¶
The cgwire/cgwire image is an all-in-one container (PostgreSQL, Redis, Nginx, Zou) that requires persistent disk storage. EC2 + EBS is simpler and cheaper for this use case than Fargate + EFS.
SECRET_KEY configuration¶
The cgwire image uses supervisord which overrides DB_PASSWORD and DB_USERNAME for the Zou process, but SECRET_KEY is inherited from the container environment. This is used for auth token encryption.
Availability Zone requirements¶
Important: The EC2 subnet must be in the same AZ as one of the ALB subnets, otherwise the ALB cannot route traffic to the instance (you'll see Target.NotInUse errors).
The EBS data volume is created in the same AZ as the EC2 instance. Both are in a public subnet protected by security groups (only port 80 from ALB SG allowed inbound).
When selecting subnets:
public_subnet_ids- At least 2 subnets in different AZs (ALB requirement)ec2_subnet_id- Must be in the same AZ as one of the public subnets
Example valid configuration:
public_subnet_ids = ["subnet-aaa (us-east-1a)", "subnet-bbb (us-east-1c)"]
ec2_subnet_id = "subnet-ccc (us-east-1c)" # Same AZ as subnet-bbb
Monitoring¶
Error logs from Zou, Nginx, and PostgreSQL are shipped to New Relic via the infrastructure agent running as a Docker sidecar.
Logs available in New Relic (query with NRQL):
-- All Kitsu errors
SELECT * FROM Log WHERE service = 'kitsu'
-- Zou API errors only
SELECT * FROM Log WHERE service = 'kitsu' AND component = 'zou-api'
-- PostgreSQL errors
SELECT * FROM Log WHERE service = 'kitsu' AND component = 'postgresql'
Host metrics (CPU, memory, disk) are also reported by the infrastructure agent.
Future improvements¶
- Separate containers with RDS/ElastiCache for larger scale
- New Relic alerting policies for error rate / CPU spikes