Try the Sandbox
Connect to the shared sandbox environment and run your flows on GPU-enabled Kubernetes. No infrastructure to deploy — just authenticate and go.
The Sandbox Environment
- A fully deployed RDE Workflow Engine platform running on Amazon EKS
- GPU and CPU node pools provisioned on-demand
- S3 artifact storage and Aurora PostgreSQL metadata tracking
- Browser-based Metaflow UI for monitoring runs and inspecting artifacts
- Open to all Amazonians with Midway — with usage limits on shared resources
Before You Start
- Metaflow installed — completed in Getting Started
- Active Midway session — run
mwinitto refresh - AWS CLI v2 — installed and on your PATH
- ada — Amazon credential helper (pre-installed on Amazon laptops)
python --version # 3.10+ aws --version # 2.x mwinit # refresh Midway session
ada returns an authentication error, run mwinit before retrying.
Connect to the Sandbox
The login CLI authenticates you via SSO, discovers the sandbox configuration, and writes your local Metaflow config.
Refresh Midway credentials
mwinit
Run the login CLI
ar-metaflow login --account 228022008044 --region us-east-1
What happens behind the scenes:
adaobtains temporary AWS credentials for the sandbox account- Your browser opens for SSO authentication (Cognito → Federate → Midway)
- Discovers cluster config from CloudFormation stack outputs
- Writes
~/.metaflowconfig/config.jsonandenv.sh - Configures
kubectlfor the sandbox EKS cluster
Source the environment
source ~/.metaflowconfig/env.sh
Add to your shell profile (one-time)
# Add to ~/.zshrc or ~/.bashrc
[ -f ~/.metaflowconfig/env.sh ] && source ~/.metaflowconfig/env.sh
env.sh, all metaflow and
python commands automatically target the sandbox. Your AWS profile, service URL, and artifact
paths are all set.
Your First Remote Run
Use the same hello_flow.py from Getting Started. No changes
needed.
Run locally first — metadata now goes to the sandbox's metadata service, not your laptop:
python hello_flow.py run
Now push the steps to Kubernetes — each step runs as a pod on EKS:
python hello_flow.py --with kubernetes run
Monitor Your Runs
Open the sandbox UI in your browser:
https://sandbox.metaflow.simulation.amazon.dev
- Authentication is automatic — same SSO chain (Envoy → Cognito → Federate → Midway)
- View all runs, filter by flow name or username
- Click any run to see the DAG, step logs, and artifact values
- Track in-progress runs with real-time updates
Request GPU Compute
Add the @resources decorator to request GPU for any step. Karpenter provisions the right instance
type automatically.
# gpu_flow.py from metaflow import FlowSpec, step, resources class GpuTestFlow(FlowSpec): """Verify GPU access on the sandbox cluster.""" @resources(gpu=1, memory=8192) @step def start(self): import subprocess result = subprocess.run( ["nvidia-smi"], capture_output=True, text=True ) print(result.stdout) self.gpu_available = result.returncode == 0 self.next(self.end) @step def end(self): status = "GPU detected" if self.gpu_available else "No GPU" print(f"Result: {status}") if __name__ == "__main__": GpuTestFlow()
python gpu_flow.py --with kubernetes run
Compute Tiers
| Tier | Decorator | Use Case |
|---|---|---|
| CPU only | @resources(cpu=2, memory=4096) |
Data preprocessing, light compute |
| Standard GPU | @resources(gpu=1, memory=8192) |
Training, inference, CUDA workloads |
| Top-tier GPU | @resources(gpu=1, memory=32000) |
Large model training, multi-GPU |
Handling Session Expiry
SSO tokens expire after approximately 1 hour. If you see 401 Unauthorized or
invalid_grant, re-authenticate:
mwinit && ar-metaflow login --account 228022008044 --region us-east-1 source ~/.metaflowconfig/env.sh
Deploy Your Own Environment
The sandbox is shared — ideal for learning and prototyping. For production workloads, your team should deploy an isolated environment in its own AWS account with separate data stores, dedicated GPU quotas, and custom access controls.
Next: Deploy Your Own →
Full operator guide for deploying an isolated environment with CDK in your own AWS account.
For the full decorator reference: Running Flows · Platform architecture: README