Nvidia has just announced the release of an open-source large language model (LLM) that is said to perform on par with leading proprietary models from OpenAI, Anthropic, Meta, and Google.
The new model, called NVLM-D-72B, has 72 billion parameters and is part of the NVLM 1.0 family of large language models recently released by Nvidia. NVLM 1.0 is essentially a family of large, borderline multimodal language models that achieve state-of-the-art results on visual language tasks, competing with leading proprietary models (e.g., GPT-4o) as well as open access models.
This new family of large language models reportedly has “industrial-grade multimodal capabilities,” with superior performance across a wide range of visual and language tasks, while also significantly improving text-based feedback. “To achieve this, we create and integrate a high-quality text-only dataset into the multimodal training process, along with a large amount of multimodal mathematical and reasoning data, resulting in improved mathematical and coding capabilities across multiple modalities,” Nvidia researchers explained in a statement.
The result is a high-performance LLM that can perform tasks as simple as explaining why a meme is funny, all the way up to complex mathematical equations, step by step. Nvidia also managed to increase the model's text-only accuracy by an average of 4.3 points on industry benchmarks, thanks to its multimodal training style.

Nvidia appears to be serious about ensuring that the model meets the Open Source Initiative's latest definition of "open source," by not only making its training weights publicly available for community review, but also promising to release the model's source code in the near future. This is a significant departure from competitors like OpenAI and Google, which have been very tight-lipped about keeping details about their LLM's weights and source code private. In doing so, Nvidia has positioned NVLM not necessarily as a direct competitor to ChatGPT-4o and Gemini 1.5 Pro, but instead as a platform for third-party developers to build their own chatbots and AI applications.