Amazon Robotics · Infrastructure Simulation Science

Run GPU-accelerated workflows
on production-ready infrastructure
in your account.

RDE Workflow Engine is a managed platform for Amazon Robotics science teams. Write Python, decorate with compute requirements, and run—from RL training to large-scale simulation. Infrastructure is handled for you.

3
Compute Backend Choices
Local → Cloud
Seamless Graduation
You Own It
Your Account, Your Data

Everything you need to ship experiments

Focus on science. The platform handles compute, storage, orchestration, and observability.

GPU Compute on Demand

Request GPUs with a simple decorator. From L4s for evaluation to H100s and B200s for large-scale training—Karpenter provisions nodes in seconds.

Artifact Storage

Every step's output is automatically versioned and stored in S3. Resume, reproduce, and share results across runs. Metadata tracked in Aurora PostgreSQL.

Workflow Orchestration

Define DAGs with Python decorators. Argo Workflows handles scheduling, retries, and parallel execution. Chain steps with foreach for hyperparameter sweeps.

Enterprise Auth

Amazon SSO via Federate. No shared credentials—IRSA for workload identity, Cognito for user auth, OIDC for browser access. Just midway in.

Metaflow UI

A built-in web UI to browse runs, inspect artifacts, debug failures, and monitor training curves in real time with live-updating cards.

Checkpoints & Recovery

Save training state periodically with @checkpoint. Resume from the last good checkpoint after failures. Pair with @retry for automatic recovery.

Built on proven AWS services

Six isolated CloudFormation stacks. Data separated from compute for safety. Designed so the EKS cluster can be rebuilt without losing a single artifact.

Your Python Code
Metaflow SDK
@step, @resources, @card
submits to
Orchestration
Metaflow Service
Metadata API
Argo Workflows
DAG Scheduler
Argo Events
Triggers + NATS
runs on
Compute
EKS Auto Mode
Kubernetes + Karpenter + Kyverno
MetaStream
Corp-Connected Ubuntu Instances
Greenland
GPU Scheduling
persists to
Data & Networking
Aurora PostgreSQL
Metadata Store
S3
Artifact Storage
Cognito + Federate
SSO Auth
Envoy Gateway
TLS + Routing

Three steps to production

Go from your laptop to a fully deployed GPU workflow platform in your own AWS account.