Documentation

Everything you need to go from your first flow to a production ML platform.

Quick Reference

@resources(gpu=1)

Request GPU compute for a step. Karpenter provisions the right instance automatically.

@retry(times=3)

Automatically retry failed steps with configurable backoff.

@checkpoint

Periodically save training state. Resume from last good checkpoint on failure.

@card(type="blank")

Attach a live-updating visualization card to any step for real-time monitoring.

@schedule(cron='...')

Schedule flows to run on a cron expression. Managed by Argo Events.

self.next(self.train, foreach='...')

Fan out to parallel steps. Perfect for hyperparameter sweeps.