Kimi Delta Attention
Expert baseline FlashKDA CUTLASS
Peak measured performance.
Optimized integration.
INT21 builds self-improving AI systems for the software beneath modern AI. Our first product generates and optimizes low-level GPU software, then proves its work with tests and benchmarks.
We are building self-improving AI systems for the software beneath modern AI. The first four implementations produced by PTX Kernel Factory are open source, and the product is entering beta.
01 / First proof
PTX Kernel Factory produced four GPU kernel artifacts across Hopper and Blackwell. We tested them on matching hardware and inputs against established expert implementations, with correctness verified before timing.
Expert baseline FlashKDA CUTLASS
Peak measured performance.
Optimized integration.
Expert baseline QuACK CuTe DSL
Faster on geometric mean.
Faster in every comparable case.
Operator-level benchmark results, not full-model speedup claims.
02 / First product
A fully autonomous agent swarm searches for GPU kernel implementations, proves them on real hardware, and carries every benchmark, failure, and successful optimization into the next generation. More runs create more evidence, better search strategies, and stronger kernels over time.
Specialized agents plan, implement, review, and optimize each kernel end to end.
Every candidate is compiled, verified, and benchmarked on the target GPU.
Reusable evidence improves both the search process and the kernels it produces.
03 / Platform
SwarmOS is a cloud-native platform for running specialized agents at elastic scale toward the same measurable goal. Agents explore in parallel, coordinate through shared evidence, and continuously converge on stronger solutions.
Cloud-native scheduling expands the swarm around available compute.
Every agent works against the same constraints and acceptance criteria.
Results, failures, and strategies become the next generation's starting point.
04 / Company
INT21 brings together deep experience in agent systems, machine learning models, GPU software, distributed infrastructure, and cloud computing.
Founders / Cross-stack operators
Research, systems, and infrastructure experience carried into one company.
Founder & CEO
Bing co-authored the original Generative Adversarial Nets paper, created XGBoost's Python package, and co-created MXNet and AITemplate. Before founding INT21, he was a Distinguished Engineer at NVIDIA following its acquisition of HippoML, the GPU inference company he founded.
Founding Partner
Qingye has spent more than a decade building and tuning high-performance computing and distributed systems at AWS. His work spans workload analysis, performance engineering, cloud infrastructure, and real-time systems.
PTX Kernel Factory / Beta
Start with an operation that is too slow, a new architecture without a mature kernel, or an important workload that has not justified weeks of specialist time.
Request beta access