Reducto

Install Reducto on EKS using Terraform.

Overview

The project creates Helm Release for Reducto on EKS in reducto namespace. And creates following required dependencies:

RDS instance
S3 bucket
Keda (for autoscaling of Reducto workers in-cluster)
Auto scaling of cluster nodes (Karpenter is configured, however you can use any cluster autoscaling tool)
AWS Load balancer controller or Ingress Nginx (however you can use any ingress controller)

This project demonstrates fully working cluster that's needed to run Reducto. Cloudflare is not a requirement, however its used here to setup TLS along with cert-manager.

Upgrades

For upgrade instructions and release notes, see MIGRATION_GUIDE.md.

Terraform Documentation

Requirements

Name	Version
terraform	>= 1.2.0
aws	6.28.0
helm	3.1.1
kubectl	1.19.0
kubernetes	3.0.1
null	3.2.4
random	3.8.0

Providers

Name	Version
aws	6.28.0
helm	3.1.1
kubectl	1.19.0
kubernetes	3.0.1
random	3.8.0

Modules

Name	Source	Version
ebs_csi_irsa_role	terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts	v6.4.0
eks	terraform-aws-modules/eks/aws	21.15.1
karpenter	terraform-aws-modules/eks/aws//modules/karpenter	21.12.0
load_balancer_controller_irsa_role	terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts	v6.4.0
rds	terraform-aws-modules/rds/aws	7.1.0
rds_proxy	terraform-aws-modules/rds-proxy/aws	4.2.1
rds_proxy_sg	terraform-aws-modules/security-group/aws	5.2
rds_sg	terraform-aws-modules/security-group/aws	5.2.0
vpc	terraform-aws-modules/vpc/aws	6.6.0
vpc_cni_irsa_role	terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts	v6.4.0

Resources

Name	Type
aws_db_subnet_group.default	resource
aws_iam_role.rds_enhanced_monitoring	resource
aws_iam_role.reducto	resource
aws_iam_role_policy.reducto	resource
aws_iam_role_policy_attachment.rds_enhanced_monitoring	resource
aws_s3_bucket.reducto_storage	resource
aws_s3_bucket_lifecycle_configuration.reducto_storage_lifecycle	resource
aws_s3_bucket_public_access_block.reducto_storage_public_access_block	resource
aws_secretsmanager_secret.superuser	resource
aws_secretsmanager_secret_version.superuser	resource
aws_security_group_rule.allow_all_cluster_and_nodes_traffic	resource
aws_security_group_rule.allow_all_cluster_and_nodes_traffic_ingress	resource
aws_security_group_rule.allow_all_intra_node_traffic	resource
aws_security_group_rule.allow_eks_cluster_access_from_vpc	resource
aws_security_group_rule.webhook_admission_inbound	resource
aws_security_group_rule.webhook_admission_outbound	resource
helm_release.aws_load_balancer_controller	resource
helm_release.cert_manager	resource
helm_release.datadog	resource
helm_release.ingress_nginx	resource
helm_release.karpenter	resource
helm_release.karpenter-crd	resource
helm_release.keda	resource
helm_release.kube_prometheus_stack	resource
helm_release.nvidia_device_plugin	resource
helm_release.opentelemetry_collector	resource
helm_release.prometheus_crds	resource
helm_release.reducto	resource
helm_release.telegraf	resource
helm_release.vllm_stack	resource
kubectl_manifest.cloudflare_api_secret	resource
kubectl_manifest.cluster_issuer	resource
kubectl_manifest.cluster_issuer_staging	resource
kubectl_manifest.cluster_manifests	resource
kubectl_manifest.datadog_secret	resource
kubectl_manifest.karpenter_node_class	resource
kubectl_manifest.karpenter_node_pool	resource
kubectl_manifest.monitoring_ns	resource
kubectl_manifest.otel_auth_secret	resource
kubectl_manifest.otel_datadog_secret	resource
kubectl_manifest.prometheus_rules	resource
kubectl_manifest.telegraf	resource
kubectl_manifest.telegraf_sm	resource
kubernetes_secret_v1.hf_token	resource
random_password.db_password	resource
random_string.secret_suffix	resource
aws_availability_zones.available	data source
aws_eks_cluster_auth.eks	data source
aws_iam_policy_document.rds_enhanced_monitoring	data source
aws_iam_policy_document.reducto	data source
kubectl_filename_list.cluster_manifests	data source
kubectl_filename_list.prometheus_rules	data source

Inputs

Name	Description	Type	Default	Required
cloudflare_api_token	Cloudflare API token for Cert Manager to use DNS solver for issuing TLS certificates	`string`	n/a	yes
cluster_endpoint_public_access	Enable public access to the EKS cluster API endpoint	`bool`	`true`	no
cluster_endpoint_public_access_cidrs	List of CIDR blocks allowed to access the public EKS API endpoint	`list(string)`	[ "0.0.0.0/0" ]	no
cluster_name	Name of the EKS cluster and prefix for related resources	`string`	`"reducto-ai"`	no
datadog_api_key	Datadog API key	`string`	`""`	no
datadog_site	Datadog site	`string`	`"us3.datadoghq.com"`	no
db_deletion_protection	Enable deletion protection for RDS database to prevent accidental deletion	`bool`	`true`	no
db_instance_class	Instance class for Reducto Postgres database	`string`	`"db.t4g.medium"`	no
db_multi_az	Enable Multi-AZ deployment for RDS database for high availability	`bool`	`true`	no
db_username	Postgres DB username	`string`	`"reducto"`	no
enable_gpu_managed_node_group	Whether to create the GPU managed node group (system_gpu) for GPU workloads	`bool`	`false`	no
enable_nvidia_device_plugin	Whether to install the NVIDIA device plugin for GPU support	`bool`	`false`	no
enable_otel_collector	Whether to deploy the OpenTelemetry Collector on the cluster	`bool`	`false`	no
enable_reducto	Whether to deploy the Reducto application via Helm	`bool`	`true`	no
enable_vllm_stack	Whether to deploy the vLLM stack on the cluster	`bool`	`false`	no
helm_release_timeout	Timeout in seconds for Helm release operations	`number`	`900`	no
otel_auth_token	Auth token used by the OpenTelemetry collector	`string`	`""`	no
otel_datadog_api_key	Datadog API key used by the OpenTelemetry collector exporter	`string`	`"admin"`	no
otel_host	FQDN for exposing the OpenTelemetry Collector	`string`	`""`	no
private_subnets	List of private subnets CIDRs	`list(string)`	`[]`	no
public_subnets	List of public subnets CIDRs	`list(string)`	`[]`	no
reducto_helm_chart	Path to Helm Chart on OCI registry	`string`	`"oci://registry.reducto.ai/reducto-api/reducto"`	no
reducto_helm_chart_version	Reducto Helm Chart version	`string`	`"1.11.32"`	no
reducto_helm_repo_password	Password for Helm Registry for Reducto Helm Chart	`string`	n/a	yes
reducto_helm_repo_username	Username for Helm Registry for Reducto Helm Chart	`string`	n/a	yes
reducto_host	Full host DNS for Reducto (Example: reducto.mydomain.com)	`string`	n/a	yes
region	AWS region where resources will be created	`string`	`"us-east-1"`	no
slack_webhook_url	Slack Webhook URL for Alertmanager	`string`	n/a	yes
vllm_stack_hf_token	Hugging Face API token used by the vLLM stack for model access	`string`	`""`	no
vpc_cidr	CIDR block for the VPC	`string`	`"10.125.0.0/16"`	no

Outputs

Name	Description
cluster_certificate_authority_data	Base64 encoded certificate data required to communicate with the cluster
cluster_endpoint	Endpoint for EKS control plane
cluster_name	Name of the EKS cluster
cluster_security_group_id	Security group ID attached to the EKS cluster
configure_kubectl	Command to configure kubectl for the EKS cluster
db_instance_endpoint	Connection endpoint for the RDS instance
db_instance_name	Name of the RDS database
db_proxy_arn	ARN of the RDS Proxy
db_proxy_endpoint	Connection endpoint for the RDS Proxy
oidc_provider_arn	ARN of the OIDC Provider for EKS
private_subnets	List of IDs of private subnets
public_subnets	List of IDs of public subnets
reducto_host	Hostname where Reducto is accessible
reducto_iam_role_arn	ARN of the IAM role for Reducto service account
region	AWS region where resources are deployed
s3_bucket_arn	ARN of the S3 bucket for Reducto storage
s3_bucket_name	Name of the S3 bucket for Reducto storage
vpc_id	ID of the VPC

Helm Chart

To obtain or inspect Helm Chart and available configurations in values.yaml

# Login
helm registry login registry.reducto.ai \
    --username <your-username>  \
    --password <your-password>

# Get latest Helm Chart
helm pull oci://registry.reducto.ai/reducto-api/reducto

Security

All worklods are only created in private subnet, including NLB for ingress-nginx.

For bootstrapping of the cluster both public and private endpoints are enabled, public endpoint access can be restricted or removed after provisioning:

Remove public endpoint cluster_endpoint_public_access = false.
Restrict public endpoint cluster_endpoint_public_access_cidrs = [ vpc_cidr ]

Terraform State

To use a bucket for Terraform state, create a bucket and update backend.tf.

OR you can skip this to quickly run Terraform plan and apply with locally managed terraform.tfstate state file for testing purposes.

Configuration

Make sure variables.tf has configuration that you desire, like restricting EKS public endpoint, avoiding VPC CIDR collisions, or database instance type.

Create terraform.tfvars with following contents:

reducto_helm_repo_username = "todo"
reducto_helm_repo_password = "todo"
reducto_host = "reducto.example.com"
cloudflare_api_token = "token"

# For alerting
slack_webhook_url = "todo"

Provisioning

Apply Terraform

terraform init
terraform plan
terraform apply

Configure Cloudflare DNS

Cloudflare DNS is used to obtain TLS certificate from Letsencrypt via cert-manager using dns01 solver.

Check the private LB hostname created by cluster for Nginx Ingress Controller and use it to create CNAME DNS record on Cloudflare to point to value provided in reducto_host.

Access Reducto

Reducto will be accessible on ingress-nginx NLB via hostname configured in reducto_host

For checking Reducto service health without public endpoint: port forward your local 4567 to Reducto service:

kubectl port-forward service/reducto-reducto-http 4567:80 -n reducto

# Access Reducto
curl localhost:4567

New AWS account

For Karpenter to request spot instances, create the service-linked role:

aws iam create-service-linked-role --aws-service-name spot.amazonaws.com

Notes on Destroy

To terraform destroy, comment out the lifecycle block in reducto-bucket.tf and remove deletion protection from DB.

You can remove deletion protection by setting var.db_deletion_protection = false and terraform apply.

terraform destroy may not finish because VPC will contain resources created outside of Terraform managment:

NLB for nginx controller created by AWS load balancer controller
EKS Nodes from autoscaling by Karpenter
Bucket not empty

So along side terraform destroy you'll need to manually delete above resources from AWS console.

Notes on NLB for Nginx

To customize NLB configuration:

See AWS Load Balancer controller annotations for Service, and Ingress Nginx Helm Chart configuration.
For NLB TLS Termination with ACM ssl cert (without cert-manager), configure target port in values/ingress-nginx-controller.yaml.
```
service:
  targetPorts:
    https: http
```

Monitoring

Reducto internal job queue length is a good indicator of overall worker health. And 5xx metric from Reducto ingress is a good indicator of API health.

PrometheusRule in manifests/prometheus/rules/01-reducto.yaml monitors internal queue length and 5xx metrics. When queue doesn't go down for a long duration OR API returns 5xx status for a long duration, alerts are sent to configured Slack channel.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.claude		.claude
docs		docs
manifests		manifests
values		values
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.terraform-docs.yml		.terraform-docs.yml
.terraform.lock.hcl		.terraform.lock.hcl
CLAUDE.md		CLAUDE.md
MIGRATION_GUIDE.md		MIGRATION_GUIDE.md
README.md		README.md
aws-load-balancer-controller.tf		aws-load-balancer-controller.tf
backend.tf		backend.tf
cert-manager.tf		cert-manager.tf
cluster-manifests.tf		cluster-manifests.tf
datadog.tf		datadog.tf
eks.tf		eks.tf
ingress-nginx-controller.tf		ingress-nginx-controller.tf
karpenter.tf		karpenter.tf
keda.tf		keda.tf
main.tf		main.tf
monitoring.tf		monitoring.tf
nvidia-device-plugin.tf		nvidia-device-plugin.tf
opentelemetry-collector.tf		opentelemetry-collector.tf
outputs.tf		outputs.tf
reducto-architecture-large.png		reducto-architecture-large.png
reducto-bucket.tf		reducto-bucket.tf
reducto-db.tf		reducto-db.tf
reducto-helm-release.tf		reducto-helm-release.tf
reducto-iam.tf		reducto-iam.tf
renovate.json		renovate.json
telegraf.tf		telegraf.tf
variables.tf		variables.tf
vllm-stack.tf		vllm-stack.tf
vpc.tf		vpc.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reducto

Overview

Upgrades

Terraform Documentation

Requirements

Providers

Modules

Resources

Inputs

Outputs

Helm Chart

Security

Terraform State

Configuration

Provisioning

Configure Cloudflare DNS

Access Reducto

New AWS account

Notes on Destroy

Notes on NLB for Nginx

Monitoring

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reducto

Overview

Upgrades

Terraform Documentation

Requirements

Providers

Modules

Resources

Inputs

Outputs

Helm Chart

Security

Terraform State

Configuration

Provisioning

Configure Cloudflare DNS

Access Reducto

New AWS account

Notes on Destroy

Notes on NLB for Nginx

Monitoring

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages