Product PTX Kernel Factory

Private beta / NVIDIA GPUs

PTX Kernel Factory

GPU kernels that improve with every run.

A fully autonomous agent swarm explores implementations, tests them on target NVIDIA hardware, and carries what it learns into the next generation.

Request beta access 01-08 Quick start AgentNode to download Read launch results

Autonomous optimization loop Evidence compounds

01 Explore Parallel agent swarm
02 Build Compile candidates
03 Prove Target GPU evidence
04 Remember Advance the generation

Generation memory Useful evidence never returns to zero.

01 Compounding system

The factory does not reset.

Each run expands the evidence available to the swarm. Over time, the search becomes more informed, the strategies become more effective, and the resulting kernels become stronger.

01

Explore in parallel

Specialized agents search different implementation paths at the same time.
02

Measure what survives

Candidates are compiled, checked, and benchmarked on the target hardware.
03

Carry the evidence forward

Useful results become experience for the next search, not a discarded trace.

02 Inside SwarmOS

See the factory think.

SwarmOS exposes every level of a run, from the global control plane to one agent's reasoning, plan, and workspace.

SwarmOS control panel showing a live PTX Kernel Factory session with eight active agents, session metrics, message queues, and automated reviews

03 Open-source proof

Published work, not a promise.

The first four factory artifacts cover RMSNorm and Kimi Delta Attention across Hopper and Blackwell. Source, tests, benchmark methods, and reports are open for inspection.

View repositories Read the launch article

01 Kimi Delta Attention

1.24x to 1.59x

NVIDIA GH200

584 upstream tests and 235 package tests passed.

02 Kimi Delta Attention

1.42x / 1.52x

NVIDIA B200

All 580 upstream tests passed.

03 RMSNorm

8.17% faster

NVIDIA GH200

48 package tests and 65 subtests passed.

04 RMSNorm

126 / 126

NVIDIA B200

Faster in every comparable case; all 66 package tests passed.

Operator-level results do not imply the same full-model speedup. The launch article includes test coverage, limitations, and measurement context.

04 Beta access

Choose how your team enters the factory.

Starter and Pro include subscription access plus discounted spot GPU access. Tokens and cloud compute are billed as used.

Starter

$200 / month

25% off spot GPU access
Pay as you use for tokens and cloud compute
Data may be used to improve our product

Join beta

Pro

Recommended plan

$2,000 / month

50% off spot GPU access
Pay as you use for tokens and cloud compute
Data will not be used

Join beta

Enterprise

Talk with the INT21 team about enterprise access.

Bring us one hard GPU workload.

Start with an operation that is too slow, a new architecture without a mature kernel, or an important workload that has not justified weeks of specialist time.

Request beta access

PTX Kernel Factory

Explore in parallel

Measure what survives

Carry the evidence forward

NVIDIA GH200

NVIDIA B200

NVIDIA GH200

NVIDIA B200

Bring us one hard GPU workload.