Product PTX Kernel Factory

Private beta / NVIDIA GPUs

PTX Kernel Factory

GPU kernels that improve with every run.

A fully autonomous agent swarm explores implementations, tests them on target NVIDIA hardware, and carries what it learns into the next generation.

The factory does not reset.

Each run expands the evidence available to the swarm. Over time, the search becomes more informed, the strategies become more effective, and the resulting kernels become stronger.

  1. 01

    Explore in parallel

    Specialized agents search different implementation paths at the same time.

  2. 02

    Measure what survives

    Candidates are compiled, checked, and benchmarked on the target hardware.

  3. 03

    Carry the evidence forward

    Useful results become experience for the next search, not a discarded trace.

See the factory think.

SwarmOS exposes every level of a run, from the global control plane to one agent's reasoning, plan, and workspace.

SwarmOS control panel showing a live PTX Kernel Factory session with eight active agents, session metrics, message queues, and automated reviews

Published work, not a promise.

The first four factory artifacts cover RMSNorm and Kimi Delta Attention across Hopper and Blackwell. Source, tests, benchmark methods, and reports are open for inspection.

01 Kimi Delta Attention
1.24x to 1.59x

NVIDIA GH200

584 upstream tests and 235 package tests passed.

02 Kimi Delta Attention
1.42x / 1.52x

NVIDIA B200

All 580 upstream tests passed.

03 RMSNorm
8.17% faster

NVIDIA GH200

48 package tests and 65 subtests passed.

04 RMSNorm
126 / 126

NVIDIA B200

Faster in every comparable case; all 66 package tests passed.

Operator-level results do not imply the same full-model speedup. The launch article includes test coverage, limitations, and measurement context.

Choose how your team enters the factory.

Starter and Pro include subscription access plus discounted spot GPU access. Tokens and cloud compute are billed as used.

Starter

$200 / month

  • 25% off spot GPU access
  • Pay as you use for tokens and cloud compute
  • Data may be used to improve our product

Enterprise

Contact us

  • Talk with the INT21 team about enterprise access.

Bring us one hard GPU workload.

Start with an operation that is too slow, a new architecture without a mature kernel, or an important workload that has not justified weeks of specialist time.

Request beta access