NVIDIA and OpenAI Launched Fastest Open Reasoning Models

    By

    Hanan Zuhry

    Hanan Zuhry

    NVIDIA and OpenAI released fast open AI models using NVFP4 and CUDA, making advanced reasoning easier and more accessible.

    NVIDIA and OpenAI Launched Fastest Open Reasoning Models

    Quick Take

    Summary is AI generated, newsroom reviewed.

    • NVIDIA and OpenAI released two open-weight reasoning models: gpt-oss-120b, 20b

    • The 120b model processes 1.5M tokens/sec using NVIDIA’s GB200 NVL72 system

    • NVFP4 precision format enables faster, energy-efficient inference without accuracy loss

    • Models run on CUDA-compatible hardware, from cloud servers to RTX desktops

    • Open-source release helps startups, researchers, and developers to build custom solutions

    NVIDIA and OpenAI have just released two new open-weight reasoning models, gpt-oss-120b and gpt-oss-20b. The 120b model can process 1.5 million tokens per second on a single NVIDIA GB200 NVL72 system. It’s made possible by a mix of NVIDIA’s Blackwell architecture and a new 4-bit precision format called NVFP4. That format helps strike a balance between accuracy and efficiency.

    What Powers the Models

    What helps these models run so efficiently is a mix of new hardware and smart software. They were trained on NVIDIA’s powerful H100 GPUs and are designed to work smoothly across a wide range of devices. You can use these from big cloud systems to regular desktop PCs with NVIDIA RTX cards. If you already use CUDA, you can probably run these models without much extra work.

    Both models are also packaged as what NVIDIA calls “Inference Microservices.” This makes models comparatively faster and easier. You don’t need to build everything from scratch. And if you’re already using popular AI tools like Hugging Face or Llama.cpp, these models will plug right in.

    NVIDIA’s newer Blackwell hardware plays a big role here, too. It includes a feature called NVFP4, which helps the models run faster and more efficiently by using lower-precision numbers without losing accuracy. That might sound technical, but the result is pretty simple. It will result in faster AI that uses less power and memory. For businesses, that can mean lower costs.

    There’s also a long-running relationship between NVIDIA and OpenAI that’s helped make this possible. This relationship goes back to when Jensen Huang literally delivered the first DGX-1 in person. What’s happening now with the gpt-oss series feels like the next logical step in that collaboration. Those productions, however, will require orders of magnitude more computing power, polish, and operational readiness. Its hardware, software, and services are all working together, which is rare to see at this level.

    Open for Everyone to Build

    One of the most important things about this release is that the models are open. It means anyone from startups & universities can also work on them. They can build on them, customize them, and use them in their systems. OpenAI now has over 4 million lifetime developers building on its platform. NVIDIA, on its side, has more than 6.5 million developers using its software tools. They’ve been working together for nearly a decade, and the reach is massive. There are hundreds of millions of GPUs worldwide that run on the NVIDIA CUDA platform. When technology like this gets released into an ecosystem that large and experienced, adoption tends to move quickly. And that’s where this starts to feel less like a launch and more like a turning point.

    Google News Icon

    Follow us on Google News

    Get the latest crypto insights and updates.

    Follow