Zero Operators · An autonomous AI research team. You stay the director.

02Problem

Most of building an AI model isn't the model.

Three weeks understanding the data. Two weeks cleaning it. One on scaffolding and training infra. Then, finally, three on model experiments, four on iteration, one on packaging. By the time you're modelling, the deadline is breathing down your neck.

System level · the canonical diagram

Configuration

Data Collection

Data Verification

Machine
Resource
Mgmt

Feature Extraction

ML Code

Analysis Tools

Serving Infrastructure

Process Management Tools

Monitoring

↑ ~5% of code is the model. The rest is plumbing. After Sculley et al. · NIPS 2015 paper

Project level · 10 weeks of a real ML delivery

first line of model code · week 4

Type →

Plumbing3 wks

Model4 wks

Packaging3 wks

Phases →

data

clean

scaffold

model exp.

iteration

package

test

deploy

.

w123 4 5678910

plumbing · 3 weeks model · 4 weeks (squeezed in the middle) packaging · 3 weeks

Individual level · what you actually do all day

before you touch the model

load data· validate schema· reconcile timestamps· handle missing values· check class balance· dedupe records· outlier detection· stratify train/val/test· build DataLoader· caching layer· write augmentation· version data· DVC· S3 bucket policy· build training loop· configure logger· wandb integration· checkpoint logic· LR scheduling· gradient clipping· mixed precision· DDP setup· config management· hydra· eval scripts· metric tracking· confusion matrix· error analysis· slice metrics· calibration· inference scripts· ONNX export· model serving· FastAPI endpoint· Docker image· deployment config· load testing· monitoring· drift detection· alerting· rollback plan· audit logging· compliance review· handoff doc…

Backed by the literature

~5% of a mature ML system's code is the model itself. The rest is glue.

Sculley et al. · NIPS 2015 paper ↗

3 of 5 data scientists spend most of their day cleaning data, not modelling.

CrowdFlower 2016 · Forbes coverage

47% of ML teams take 4–6 months to ship a single model. 68% abandon experiments.

Comet 2021 · ML Practitioner Survey

03The unlock

Weeks of work. Now in days.

Data engineering, cleaning, scaffolding, eval scripts, packaging, done in days, fully automated. You act as research director: defining the model, reading results, choosing the next experiment. ZO executes everything else.

5 working days · compressed from 10 weeks

ZO runs the cycle

data · clean · scaffold

model exp. ↻ ZO iterates till benchmark

package · test · deploy

weekend

continuous direction ZO tracks every run, recommends the next step, retrains till the oracle clears

You direct

approve plan ~30 min day 1

gate 2 · features ~30 min day 2

oracle plateau ~15 min if needed

gate 4 · analysis ~45 min day 5

.

d1d2d3d4d5d6d7

Weeks of work, reduced to days. Fully automated. You direct, ZO executes. 2 hours of human attention · 5 days of work · oracle-gated · reproducible · full audit trail

04The idea

Write one file. Walk away.
Come back to verified work.

You write a plan.md with your data, your metrics, your tiered success criteria. A lead orchestrator decomposes it. Specialist agents (data engineer, model builder, oracle, domain evaluator, code reviewer) run the work in parallel and check each other's output.

Your part is the research direction, not the typing. You approve the plan, review the gates, read the final report. Out the other end: a checkpoint, a recipe, an audit log: everything a domain expert needs to sign off on the work.

05The handoff

You bring two things. You get four.

The contract is small on your end and large on ours. Everything between is autonomous.

You provide 01

Path to your data a directory, a bucket, a database, wherever it lives
plan.md objective & tiered oracle: must · should · could. Draft it yourself, or write it jointly with ZO.

ZeroOperators

research → train → evaluate → iterate

You receive 02

Trained model checkpoint, tokenizer / preprocessor, eval scorecard
Reproducible recipe configs, seeds, commands to re-run from scratch
Audit log every decision, every failure, signed by the oracle
Clean delivery repo zero infra artifacts. Your team can ship it.

06A digital research team

A team, not a single agent.

One generalist agent judges its own work. Zero Operators is a team: five specialised roles working in parallel against a shared task list. Different cognition checks the work. The lead assigns. Peers communicate directly. Nobody marks their own homework.

Lead spawns a shared task list. Five specialised agents claim and complete tasks in parallel, and talk directly to each other to coordinate. Cross-checking is built into the topology, not asked of one model.

Live session: the team in motion.

Run zo build and this is what happens: a real tmux session, the lead on the left, a spawned research-scout on the right.

[zo] 0:bash-1.1* 1:zo-build-lead 2:research-scoutZ 3:data-engineer 4:feature-synth 5:trainer 6:oracle 7:logs "zo-build" sam@zo 19:45 26-Apr-26

lead-orchestrator phase 0 · spawning

"RL experiment orchestration and training"

- Evaluation protocol: 100 episodes, deterministic actions, report mean ± std

5. Environment Setup: spawn data-engineer to:

- Add dependencies to pyproject.toml (gymnasium, stable-baselines3)

- Write a minimal env verification script

- Set up configs for both algorithms

Should I proceed? I'll create a todo list and spawn the agents in parallel:

research-scout for the survey, data-engineer for environment setup.

> Select 1 or 2/3

Yes please

● Starting Phase 0. Spawning both agents in parallel.

● 2 agents launched (ctrl+o to expand)

└ research-scout

1. Prior art survey and baselines

└ data-engineer

1. Environment setup and dependencies

● Both agents running in parallel:

- research-scout: surveying prior art for A2C/PPO on Pendulum-v1

- data-engineer: adding dependencies, writing configs, verifying env

Waiting on both before advancing the phase gate.

<system-reminder> data-engineer done: env and deps set up

● data-engineer is done (env verified, passes baseline at ±26).

Waiting on research-scout to finish before advancing.

▪ hook · tmuxmate running

> How would you like to proceed?

⌃B + R for Research · ⇧↹ to switch · 2 background agents (ctrl+o to show)

research-scout running · 4m 26s · 7.2k tokens

● Fetch(stable-baselines3.readthedocs.io/.../rl_tips)

└ Received 1,768 (200 OK)

● Fetch(gymnasium.farama.org/.../pendulum)

└ Received 40,768 (200 OK)

● Writing prior_art_survey.md

Determining…

data-engineer done · env verified ✓

● Added 4 deps to pyproject.toml

● Wrote configs/a2c.yaml, configs/ppo.yaml

● Verified env: Pendulum-v1 obs/action spaces ✓

● Baseline policy hits ±26 (within plan target)

→ Task #3 → ok-engineer (peer)

oracle idle · awaiting phase 1

─ contract loaded: must · should · could

─ tier-0 gate armed

[zo] 0:bash-1.1 1:lead-2.1* 2:scout-3.1 3:engineer-4.1 "RL experiment orchestration" 19:45 26-Apr-26

A real tmux session: the lead on the left with your chat input, and the spawned agents stacked on the right, each running its own task. The orchestrator decomposes your plan.md, peers report back, and gates only advance when the oracle says so.

07How it works

Six phases. Gated at every transition.

Each phase has a contract: inputs, outputs, success criteria, budget. Human gates sit at feature selection and analysis. Everything else runs until the oracle says pass.

01Auto

Data

Source, validate, version. Nothing leaves this phase unlabelled, unseeded, or un-hashed.
agents · data-engineer · scout
02Human gate

Features

Proposals from the feature bench. You approve the set. This is the only place the agents wait on you.
agents · feature-synth · statistician
03Auto

Model

Architecture drafted, hyperparameters defined, baselines proposed. Contract spawned for training.
agents · model-builder · architect
04Auto ↻ iterates

Training

Run, evaluate, learn, retry. The oracle decides when a model is done, not the model.
agents · trainer · oracle · analyst
05Human gate

Analysis

Results, failures, confusion. You read the report. You approve the narrative. You decide if it ships.
agents · analyst · writer
06Auto

Packaging

Clean delivery repo. Zero infrastructure artifacts. A bundle your team can deploy without reading our docs.
agents · packager · release-eng

↻ Phase 04 iterates. The model builder drafts child hypotheses, trains, evaluates against the oracle, and either hits the target, plateaus, or gives up cleanly. Karpathy-style experiment loop, fully automated.

08Oracle & memory

Every claim verified. Every failure remembered.

No agent marks its own homework. And no mistake gets to happen twice.

Oracle Verifying…

The source of truth.

Every phase ends with a verdict. Must-pass gates block delivery. Should-pass flag concerns. Could-pass surface warnings. Your targets, not ours.

Must-passmeets target metricpending
Must-passreproducible from seedpending
Should-passcoverage thresholdpending
Could-passstatistical significancepending

Tiered. Deterministic. Your targets become the oracle's contract. No agent marks its own homework.

Memory portable · persistent

It never forgets.

The entire project state lives in a portable .zo/ directory. Pause here, resume anywhere. Laptop to cloud, GPU rig to CI runner. The context moves with the work.

14:02DECISIONArchitecture: hybrid orchestration model

14:47GATEPhase 01 complete · context snapshot saved

15:12ERRORDoc-codebase drift · 10 files stale

15:14PRIORPR-005 · aspirational rules without enforcement are dead letter

09:30RESUMEsession-020 · resumed on gpu-server-03 · full context restored

STATE · DECISION_LOG · PRIORS · semantic recall. Same mistake? Literally cannot happen twice.

09What makes it different

Three tools. Three units of work.
Only one ships a trained model.

Each category operates at a different level of abstraction. Coding assistants work a line at a time. Agent frameworks work a task at a time. Zero Operators ships a verified, audit-ready model, on your data, against your oracle.

Coding assistants

Cursor · Copilot · Oh My Claude

Unit of worka line or a function
Human isthe pair programmer
Verification"looks right to me"
Memorycurrent session only
Deliverycode in your editor

Agent frameworks

CrewAI · AutoGen · LangGraph

Unit of worka task in a DAG
Human isthe prompt engineer
Verificationoptional checks
Memorybasic scratchpad state
Deliveryoutput files

Zero Operators

ZeroOperators

Unit of worka trained model
Human isthe research director
Verificationoracle-mandated, tiered, cross-checked
Memorypersistent · self-evolving
Deliverycheckpoint, recipe, audit log

10Origin

Where this came from.

Creator note

I had an eight-week production ML project. Eight weeks to do all of it (data understanding, cleaning, scaffolding, training, iteration, packaging) and a model delivered to production. The math didn't work.

So I asked the obvious question: what if I had a digital research team at my fingertips, doing all the manual work, while I acted as the research director: validating, checking results, choosing what to try next?

That's what ZO is. A full production-grade AI research team. Tied to a fixed, repeatable, reproducible workflow. ZO tokens cost a fraction of one percent of the project budget. Best ROI I've ever made on a tool.

Samyakh (Sam) Tukra · creator · samtukra.com ↗

11Quick start

Four commands. Then walk away.

zsh · ~/zero-operators

# Clone and set up.
❯ git clone https://github.com/SamPlvs/zero-operators.git
❯ cd zero-operators && ./setup.sh

# Initialize a project.
❯ zo init my-project
❯ zo draft --project my-project

# Build. Walk away.
❯ zo build plans/my-project.md
# ⏵ phase 01 · data       pass
# ⏵ phase 02 · features   awaiting gate
# ⏵ phase 03 · model      pass
# ⏵ phase 04 · training   pass · oracle ✓
# ⏵ phase 05 · analysis   awaiting gate
# ⏵ phase 06 · packaging  pass
# → delivered to ~/deliveries/my-project · all gates PASS▊

Most of building an AI model isn't the model.

Weeks of work. Now in days.

Write one file. Walk away. Come back to verified work.

You bring two things. You get four.

A team, not a single agent.

Live session: the team in motion.

Six phases. Gated at every transition.

Data

Features

Model

Training

Analysis

Packaging

Every claim verified. Every failure remembered.

The source of truth.

It never forgets.

Three tools. Three units of work. Only one ships a trained model.

Cursor · Copilot · Oh My Claude

CrewAI · AutoGen · LangGraph

ZeroOperators

Where this came from.

Four commands. Then walk away.

Write one file. Walk away.
Come back to verified work.

Three tools. Three units of work.
Only one ships a trained model.