Getting Started
Write and run your first ML pipeline in under 5 minutes. No cloud account, no Docker, no Kubernetes — just Python.
Install Metaflow
Metaflow is a pure Python package. One command to install:
pip install metaflow
Verify the installation:
metaflow version # 2.12.x
~/.metaflow/, artifacts in a local directory. You can add cloud compute later without changing
your code.
Write a Simple Pipeline
Create a file called hello_flow.py. A Metaflow flow is a Python class where each method
decorated with @step becomes a node in a directed acyclic graph (DAG).
# hello_flow.py from metaflow import FlowSpec, step class HelloFlow(FlowSpec): """A minimal two-step pipeline.""" @step def start(self): self.message = "Hello from RDE Workflow Engine!" print(self.message) self.next(self.end) @step def end(self): print(f"Flow complete: {self.message}") if __name__ == "__main__": HelloFlow()
Run it:
python hello_flow.py run
Metaflow executes the DAG — start then end — and prints both messages.
Notice that self.message set in start is automatically available in
end. That's a data artifact — Metaflow persists it between steps for you.
Try a Parallel Sweep
Metaflow's foreach lets you fan out to parallel branches. Here's a hyperparameter sweep that runs
3 experiments simultaneously:
# train_flow.py from metaflow import FlowSpec, step class TrainFlow(FlowSpec): """Parallel hyperparameter sweep on your laptop.""" @step def start(self): self.learning_rates = [0.001, 0.01, 0.1] print(f"Sweeping {len(self.learning_rates)} learning rates") self.next(self.train, foreach="learning_rates") @step def train(self): import random self.lr = self.input self.accuracy = min(0.99, 0.5 + self.lr * random.uniform(5, 15)) print(f" lr={self.lr} → accuracy={self.accuracy:.4f}") self.next(self.join) @step def join(self, inputs): best = max(inputs, key=lambda x: x.accuracy) self.best_lr = best.lr self.best_accuracy = best.accuracy print(f"Best: lr={self.best_lr} (accuracy={self.best_accuracy:.4f})") self.next(self.end) @step def end(self): print(f"Pipeline done. Ship lr={self.best_lr}") if __name__ == "__main__": TrainFlow()
python train_flow.py run
Locally, foreach runs branches as parallel processes. On the cloud, each branch becomes a
separate Kubernetes pod. Same code — no changes needed.
Understanding the Execution
When you ran hello_flow.py, Metaflow:
- Executed your two-step DAG:
start → end - Ran each step as a separate task in its own process
-
Serialized the data artifact
self.messageafterstartand deserialized it beforeend - Recorded a run with a unique ID — you can inspect it later
- Stored everything locally in
~/.metaflow/
Key Concepts
Flow
A Python class inheriting from FlowSpec — your pipeline definition.
Step
A method decorated with @step — one node in the DAG.
Artifact
Any self.x attribute — automatically persisted between steps.
Run
A single execution of a flow — tracked with a unique run ID.
Ready for Real Compute?
Your flows run on your laptop — great for development and testing. When you need GPUs, more memory, or
hundreds of parallel pods, connect to the shared sandbox. Same Python code, zero changes — just
authenticate and add --with kubernetes.
Next: Try the Sandbox →
Connect to a GPU-enabled Kubernetes cluster and run your flows on real compute. Open to all Amazonians with Midway.
For the full decorator reference (@resources, @retry, @card,
@schedule), see Running Flows.