Documentation

Everything you need to go from your first flow to a production ML platform.

Getting Started Journey

Try Locally

Install Metaflow and run your first pipeline on your laptop. No cloud account needed — just Python.

Try the Sandbox

Connect to the shared sandbox and run your flows on GPU-enabled Kubernetes. No infrastructure to deploy.

Deploy Your Own

Full operator guide — deploy an isolated environment in your AWS account with CDK.

Reference

README

Platform overview, key concepts, compute tiers, architecture, and what this gives you over raw infrastructure.

Running Flows

Write flows, request GPU compute, parallelize with foreach, monitor in real time, and schedule production runs.

Quick Reference

@resources(gpu=1)
Request GPU compute for a step. Karpenter provisions the right instance automatically.
@retry(times=3)
Automatically retry failed steps with configurable backoff.
@checkpoint
Periodically save training state. Resume from last good checkpoint on failure.
@card(type="blank")
Attach a live-updating visualization card to any step for real-time monitoring.
@schedule(cron='...')
Schedule flows to run on a cron expression. Managed by Argo Events.
self.next(self.train, foreach='...')
Fan out to parallel steps. Perfect for hyperparameter sweeps.