NVIDIA and OpenAI Launched Fastest Open Reasoning Models

Hanan Zuhry

about 3 hours ago

NVIDIA and OpenAI released fast open AI models using NVFP4 and CUDA, making advanced reasoning easier and more accessible.

NVIDIA and OpenAI Launched Fastest Open Reasoning Models

Quick Take

Summary is AI generated, newsroom reviewed.

NVIDIA and OpenAI released two open-weight reasoning models: gpt-oss-120b, 20b
The 120b model processes 1.5M tokens/sec using NVIDIA’s GB200 NVL72 system
NVFP4 precision format enables faster, energy-efficient inference without accuracy loss
Models run on CUDA-compatible hardware, from cloud servers to RTX desktops
Open-source release helps startups, researchers, and developers to build custom solutions

NVIDIA and OpenAI have just released two new open-weight reasoning models, gpt-oss-120b and gpt-oss-20b. The 120b model can process 1.5 million tokens per second on a single NVIDIA GB200 NVL72 system. It’s made possible by a mix of NVIDIA’s Blackwell architecture and a new 4-bit precision format called NVFP4. That format helps strike a balance between accuracy and efficiency.

What Powers the Models

What helps these models run so efficiently is a mix of new hardware and smart software. They were trained on NVIDIA’s powerful H100 GPUs and are designed to work smoothly across a wide range of devices. You can use these from big cloud systems to regular desktop PCs with NVIDIA RTX cards. If you already use CUDA, you can probably run these models without much extra work.

Both models are also packaged as what NVIDIA calls “Inference Microservices.” This makes models comparatively faster and easier. You don’t need to build everything from scratch. And if you’re already using popular AI tools like Hugging Face or Llama.cpp, these models will plug right in.

NVIDIA’s newer Blackwell hardware plays a big role here, too. It includes a feature called NVFP4, which helps the models run faster and more efficiently by using lower-precision numbers without losing accuracy. That might sound technical, but the result is pretty simple. It will result in faster AI that uses less power and memory. For businesses, that can mean lower costs.

There’s also a long-running relationship between NVIDIA and OpenAI that’s helped make this possible. This relationship goes back to when Jensen Huang literally delivered the first DGX-1 in person. What’s happening now with the gpt-oss series feels like the next logical step in that collaboration. Those productions, however, will require orders of magnitude more computing power, polish, and operational readiness. Its hardware, software, and services are all working together, which is rare to see at this level.

Open for Everyone to Build

One of the most important things about this release is that the models are open. It means anyone from startups & universities can also work on them. They can build on them, customize them, and use them in their systems. OpenAI now has over 4 million lifetime developers building on its platform. NVIDIA, on its side, has more than 6.5 million developers using its software tools. They’ve been working together for nearly a decade, and the reach is massive. There are hundreds of millions of GPUs worldwide that run on the NVIDIA CUDA platform. When technology like this gets released into an ecosystem that large and experienced, adoption tends to move quickly. And that’s where this starts to feel less like a launch and more like a turning point.