Use compute
to improve
compute.

INT21 builds self-improving AI systems for the software beneath modern AI. Our first product generates and optimizes low-level GPU software, then proves its work with tests and benchmarks.

INT21 Use compute to improve compute
INT21 Use compute to improve compute
PTX Kernel Factory First public release

Company launch

Introducing INT21 and PTX Kernel Factory

We are building self-improving AI systems for the software beneath modern AI. The first four implementations produced by PTX Kernel Factory are open source, and the product is entering beta.

01 / First proof

Generated by PTX Factory. Measured against best baseline.

PTX Kernel Factory produced four GPU kernel artifacts across Hopper and Blackwell. We tested them on matching hardware and inputs against established expert implementations, with correctness verified before timing.

Factory artifacts

Kimi Delta Attention

Expert baseline FlashKDA CUTLASS

GH200 / Hopper 1.59x

Peak measured performance.

B200 / Blackwell 1.52x

Optimized integration.

Factory artifacts

RMSNorm

Expert baseline QuACK CuTe DSL

GH200 / Hopper 8.17%

Faster on geometric mean.

B200 / Blackwell 126 / 126

Faster in every comparable case.

Operator-level benchmark results, not full-model speedup claims.

02 / First product

PTX Kernel Factory

Every generation starts ahead of the last.

A fully autonomous agent swarm searches for GPU kernel implementations, proves them on real hardware, and carries every benchmark, failure, and successful optimization into the next generation. More runs create more evidence, better search strategies, and stronger kernels over time.

Autonomous optimization loop Evidence retained across generations
PTX Kernel Factory autonomous improvement loop An autonomous agent swarm generates GPU kernels, evaluates them on target hardware, retains the evidence, and feeds that knowledge into the next generation. PLAN WRITE REVIEW TUNE AGENT SWARM AUTONOMOUS TARGET GPU LIVE 01 COMPILE 02 CORRECTNESS 03 BENCHMARK BEST VALID CANDIDATE +18.7% GENERATIONS GEN 01 GEN 08 GEN N+1 SEARCH MEMORY RETAINED EVIDENCE STARTS THE NEXT GENERATION PTX Kernel Factory autonomous improvement loop An agent swarm generates kernels, target hardware evaluates them, and retained evidence improves the next generation. AUTONOMOUS SWARM PLAN WRITE REVIEW TUNE AGENT SWARM TARGET GPU LIVE COMPILE CORRECT BENCHMARK BEST VALID CANDIDATE +18.7% RETAINED EVIDENCE GEN 01 GEN 08 GEN N+1 EVIDENCE COMPOUNDS INTO THE NEXT GENERATION
  1. 01

    Fully autonomous swarm

    Specialized agents plan, implement, review, and optimize each kernel end to end.

  2. 02

    Grounded in real hardware

    Every candidate is compiled, verified, and benchmarked on the target GPU.

  3. 03

    Improvement compounds

    Reusable evidence improves both the search process and the kernels it produces.

03 / Platform

Built on SwarmOS.

Thousands of agents. One evolving system.

SwarmOS is a cloud-native platform for running specialized agents at elastic scale toward the same measurable goal. Agents explore in parallel, coordinate through shared evidence, and continuously converge on stronger solutions.

Cloud-native swarm control plane Live system model
  1. 01 / Elastic scale Thousands of agents

    Cloud-native scheduling expands the swarm around available compute.

  2. 02 / Shared direction One measurable goal

    Every agent works against the same constraints and acceptance criteria.

  3. 03 / Generational memory Experience carries forward

    Results, failures, and strategies become the next generation's starting point.

04 / Company

Founded by experts across the full AI stack.

INT21 brings together deep experience in agent systems, machine learning models, GPU software, distributed infrastructure, and cloud computing.

  1. 01Agents
  2. 02Models
  3. 03GPU
  4. 04Infrastructure
  5. 05Cloud

AI-native operating model

Engineering capacity scales with compute, not headcount.

Our experts set direction, constraints, and acceptance criteria. Autonomous agent swarms execute, evaluate, and retain the work, so adding compute expands how much engineering INT21 can perform.

Founders / Cross-stack operators

Research, systems, and infrastructure experience carried into one company.

Bing Xu, Founder and CEO of INT21

Bing Xu

Founder & CEO

  • Agents
  • Models
  • GPU

Bing co-authored the original Generative Adversarial Nets paper, created XGBoost's Python package, and co-created MXNet and AITemplate. Before founding INT21, he was a Distinguished Engineer at NVIDIA following its acquisition of HippoML, the GPU inference company he founded.

Qingye Jiang, Founding Partner at INT21

Qingye Jiang

Founding Partner

  • Infrastructure
  • Cloud

Qingye has spent more than a decade building and tuning high-performance computing and distributed systems at AWS. His work spans workload analysis, performance engineering, cloud infrastructure, and real-time systems.

PTX Kernel Factory / Beta

Bring us a hard GPU workload.

Start with an operation that is too slow, a new architecture without a mature kernel, or an important workload that has not justified weeks of specialist time.

Request beta access