Zero Operators · v1.0.2 · by Samyakh (Sam) Tukrawith Callum Adamson

An autonomous AI research team.
You stay the director.

Most of building an AI model isn't the model. Three weeks understanding the data, two weeks cleaning, one on infra, and by the time you're modelling, the deadline is breathing down your neck. Zero Operators runs that pipeline as a full production-grade AI research team. You stay where you add value: reading results, choosing the next experiment, deciding when to pivot.

Building or curious? Join the Discord →

Agents
20
Phases
6
Oracle
Tiered
Memory
Persistent
A quiet workshop at dusk, with the plan on the desk, agents glowing on the monitors, the city going amber through the glass.
Your research team, working through the night.
Scroll
02Problem

Most of building an AI model isn't the model.

Three weeks understanding the data. Two weeks cleaning it. One on scaffolding and training infra. Then, finally, three on model experiments, four on iteration, one on packaging. By the time you're modelling, the deadline is breathing down your neck.

System level · the canonical diagram
Configuration
Data Collection
Data Verification
Machine
Resource
Mgmt
Feature Extraction
ML Code
Analysis Tools
Serving Infrastructure
Process Management Tools
Monitoring
↑  ~5%  of code is the model. The rest is plumbing. After Sculley et al. · NIPS 2015 paper
Project level · 10 weeks of a real ML delivery
first line of model code · week 4
Type →
Plumbing3 wks
Model4 wks
Packaging3 wks
Phases →
data
clean
scaffold
model exp.
iteration
package
test
deploy
w123 4 5678910
plumbing · 3 weeks model · 4 weeks (squeezed in the middle) packaging · 3 weeks
Individual level · what you actually do all day
before you touch the model
load data· validate schema· reconcile timestamps· handle missing values· check class balance· dedupe records· outlier detection· stratify train/val/test· build DataLoader· caching layer· write augmentation· version data· DVC· S3 bucket policy· build training loop· configure logger· wandb integration· checkpoint logic· LR scheduling· gradient clipping· mixed precision· DDP setup· config management· hydra· eval scripts· metric tracking· confusion matrix· error analysis· slice metrics· calibration· inference scripts· ONNX export· model serving· FastAPI endpoint· Docker image· deployment config· load testing· monitoring· drift detection· alerting· rollback plan· audit logging· compliance review· handoff doc
Backed by the literature

~5% of a mature ML system's code is the model itself. The rest is glue.

Sculley et al. · NIPS 2015 paper ↗

3 of 5 data scientists spend most of their day cleaning data, not modelling.

CrowdFlower 2016 · Forbes coverage

47% of ML teams take 4–6 months to ship a single model. 68% abandon experiments.

Comet 2021 · ML Practitioner Survey
03The unlock

Weeks of work. Now in days.

Data engineering, cleaning, scaffolding, eval scripts, packaging, done in days, fully automated. You act as research director: defining the model, reading results, choosing the next experiment. ZO executes everything else.

5 working days · compressed from 10 weeks
ZO runs the cycle
data · clean · scaffold
model exp. ↻ ZO iterates till benchmark
package · test · deploy
weekend
continuous direction ZO tracks every run, recommends the next step, retrains till the oracle clears
You direct
approve plan ~30 min day 1
gate 2 · features ~30 min day 2
oracle plateau ~15 min if needed
gate 4 · analysis ~45 min day 5
d1d2d3d4d5d6d7
Weeks of work, reduced to days. Fully automated. You direct, ZO executes. 2 hours of human attention · 5 days of work · oracle-gated · reproducible · full audit trail
04The idea

Write one file. Walk away.
Come back to verified work.

You write a plan.md with your data, your metrics, your tiered success criteria. A lead orchestrator decomposes it. Specialist agents (data engineer, model builder, oracle, domain evaluator, code reviewer) run the work in parallel and check each other's output.

Your part is the research direction, not the typing. You approve the plan, review the gates, read the final report. Out the other end: a checkpoint, a recipe, an audit log: everything a domain expert needs to sign off on the work.

05The handoff

You bring two things. You get four.

The contract is small on your end and large on ours. Everything between is autonomous.

You provide 01
  • Path to your data a directory, a bucket, a database, wherever it lives
  • plan.md objective & tiered oracle: must · should · could. Draft it yourself, or write it jointly with ZO.
ZeroOperators
research → train → evaluate → iterate
You receive 02
  • Trained model checkpoint, tokenizer / preprocessor, eval scorecard
  • Reproducible recipe configs, seeds, commands to re-run from scratch
  • Audit log every decision, every failure, signed by the oracle
  • Clean delivery repo zero infra artifacts. Your team can ship it.
06A digital research team

A team, not a single agent.

One generalist agent judges its own work. Zero Operators is a team: five specialised roles working in parallel against a shared task list. Different cognition checks the work. The lead assigns. Peers communicate directly. Nobody marks their own homework.

Lead spawns a shared task list. Five specialised agents claim and complete tasks in parallel, and talk directly to each other to coordinate. Cross-checking is built into the topology, not asked of one model.

Live session: the team in motion.

Run zo build and this is what happens: a real tmux session, the lead on the left, a spawned research-scout on the right.

[zo] 0:bash-1.1* 1:zo-build-lead 2:research-scoutZ 3:data-engineer 4:feature-synth 5:trainer 6:oracle 7:logs "zo-build" sam@zo 19:45 26-Apr-26
lead-orchestrator phase 0 · spawning
"RL experiment orchestration and training"
- Evaluation protocol: 100 episodes, deterministic actions, report mean ± std
5. Environment Setup: spawn data-engineer to:
- Add dependencies to pyproject.toml (gymnasium, stable-baselines3)
- Write a minimal env verification script
- Set up configs for both algorithms
Should I proceed? I'll create a todo list and spawn the agents in parallel:
research-scout for the survey, data-engineer for environment setup.
> Select 1 or 2/3
Yes please
● Starting Phase 0. Spawning both agents in parallel.
● 2 agents launched (ctrl+o to expand)
└ research-scout
1. Prior art survey and baselines
└ data-engineer
1. Environment setup and dependencies
● Both agents running in parallel:
- research-scout: surveying prior art for A2C/PPO on Pendulum-v1
- data-engineer: adding dependencies, writing configs, verifying env
Waiting on both before advancing the phase gate.
<system-reminder> data-engineer done: env and deps set up
● data-engineer is done (env verified, passes baseline at ±26).
Waiting on research-scout to finish before advancing.
▪ hook · tmuxmate running
> How would you like to proceed?
⌃B + R for Research · ⇧↹ to switch · 2 background agents (ctrl+o to show)
research-scout running · 4m 26s · 7.2k tokens
● Fetch(stable-baselines3.readthedocs.io/.../rl_tips)
└ Received 1,768 (200 OK)
● Fetch(gymnasium.farama.org/.../pendulum)
└ Received 40,768 (200 OK)
● Writing prior_art_survey.md
Determining…
data-engineer done · env verified ✓
● Added 4 deps to pyproject.toml
● Wrote configs/a2c.yaml, configs/ppo.yaml
● Verified env: Pendulum-v1 obs/action spaces ✓
● Baseline policy hits ±26 (within plan target)
→ Task #3 → ok-engineer (peer)
oracle idle · awaiting phase 1
─ contract loaded: must · should · could
─ tier-0 gate armed
[zo] 0:bash-1.1 1:lead-2.1* 2:scout-3.1 3:engineer-4.1 "RL experiment orchestration" 19:45 26-Apr-26

A real tmux session: the lead on the left with your chat input, and the spawned agents stacked on the right, each running its own task. The orchestrator decomposes your plan.md, peers report back, and gates only advance when the oracle says so.

07How it works

Six phases. Gated at every transition.

Each phase has a contract: inputs, outputs, success criteria, budget. Human gates sit at feature selection and analysis. Everything else runs until the oracle says pass.

  1. 01Auto

    Data

    Source, validate, version. Nothing leaves this phase unlabelled, unseeded, or un-hashed.

    agents · data-engineer · scout
  2. 02Human gate

    Features

    Proposals from the feature bench. You approve the set. This is the only place the agents wait on you.

    agents · feature-synth · statistician
  3. 03Auto

    Model

    Architecture drafted, hyperparameters defined, baselines proposed. Contract spawned for training.

    agents · model-builder · architect
  4. 04Auto ↻ iterates

    Training

    Run, evaluate, learn, retry. The oracle decides when a model is done, not the model.

    agents · trainer · oracle · analyst
  5. 05Human gate

    Analysis

    Results, failures, confusion. You read the report. You approve the narrative. You decide if it ships.

    agents · analyst · writer
  6. 06Auto

    Packaging

    Clean delivery repo. Zero infrastructure artifacts. A bundle your team can deploy without reading our docs.

    agents · packager · release-eng
Phase 04 iterates. The model builder drafts child hypotheses, trains, evaluates against the oracle, and either hits the target, plateaus, or gives up cleanly. Karpathy-style experiment loop, fully automated.
08Oracle & memory

Every claim verified. Every failure remembered.

No agent marks its own homework. And no mistake gets to happen twice.

Oracle Verifying…

The source of truth.

Every phase ends with a verdict. Must-pass gates block delivery. Should-pass flag concerns. Could-pass surface warnings. Your targets, not ours.

  • Must-passmeets target metricpending
  • Must-passreproducible from seedpending
  • Should-passcoverage thresholdpending
  • Could-passstatistical significancepending

Tiered. Deterministic. Your targets become the oracle's contract. No agent marks its own homework.

Memory portable · persistent

It never forgets.

The entire project state lives in a portable .zo/ directory. Pause here, resume anywhere. Laptop to cloud, GPU rig to CI runner. The context moves with the work.

14:02DECISIONArchitecture: hybrid orchestration model
14:47GATEPhase 01 complete · context snapshot saved
15:12ERRORDoc-codebase drift · 10 files stale
15:14PRIORPR-005 · aspirational rules without enforcement are dead letter
09:30RESUMEsession-020 · resumed on gpu-server-03 · full context restored

STATE · DECISION_LOG · PRIORS · semantic recall. Same mistake? Literally cannot happen twice.

09What makes it different

Three tools. Three units of work.
Only one ships a trained model.

Each category operates at a different level of abstraction. Coding assistants work a line at a time. Agent frameworks work a task at a time. Zero Operators ships a verified, audit-ready model, on your data, against your oracle.

Coding assistants

Cursor · Copilot · Oh My Claude

  • Unit of worka line or a function
  • Human isthe pair programmer
  • Verification"looks right to me"
  • Memorycurrent session only
  • Deliverycode in your editor
Agent frameworks

CrewAI · AutoGen · LangGraph

  • Unit of worka task in a DAG
  • Human isthe prompt engineer
  • Verificationoptional checks
  • Memorybasic scratchpad state
  • Deliveryoutput files
Zero Operators

ZeroOperators

  • Unit of worka trained model
  • Human isthe research director
  • Verificationoracle-mandated, tiered, cross-checked
  • Memorypersistent · self-evolving
  • Deliverycheckpoint, recipe, audit log
10Origin

Where this came from.

Creator note

I had an eight-week production ML project. Eight weeks to do all of it (data understanding, cleaning, scaffolding, training, iteration, packaging) and a model delivered to production. The math didn't work.

So I asked the obvious question: what if I had a digital research team at my fingertips, doing all the manual work, while I acted as the research director: validating, checking results, choosing what to try next?

That's what ZO is. A full production-grade AI research team. Tied to a fixed, repeatable, reproducible workflow. ZO tokens cost a fraction of one percent of the project budget. Best ROI I've ever made on a tool.

Samyakh (Sam) Tukra · creator · samtukra.com ↗
11Quick start

Four commands. Then walk away.

zsh · ~/zero-operators
# Clone and set up.
 git clone https://github.com/SamPlvs/zero-operators.git
 cd zero-operators && ./setup.sh

# Initialize a project.
 zo init my-project
 zo draft --project my-project

# Build. Walk away.
 zo build plans/my-project.md
# ⏵ phase 01 · data       pass
# ⏵ phase 02 · features   awaiting gate
# ⏵ phase 03 · model      pass
# ⏵ phase 04 · training   pass · oracle ✓
# ⏵ phase 05 · analysis   awaiting gate
# ⏵ phase 06 · packaging  pass
# → delivered to ~/deliveries/my-project · all gates PASS