Nvidia, $NVDA, has released a new open source model to rival GPT-4 and other LLM models
Nvidia has unveiled a powerful open-source AI model designed to compete with proprietary systems from industry leaders like OpenAI and Google.
The new NVLM 1.0 series of large multimodal language models, spearheaded by the NVLM-D-72B with 72 billion parameters, delivers outstanding performance in both vision and language tasks, while also boosting capabilities in text-only processing.
"We present NVLM 1.0, a family of advanced multimodal large language models that set new standards in vision-language tasks, matching the performance of top proprietary models (e.g., GPT-4) and open-access alternatives," Nvidia’s researchers explain in their paper.
In a move that contrasts with the industry’s trend of keeping cutting-edge AI systems closed, Nvidia has made the model’s weights publicly available and plans to release the training code, providing researchers and developers with unprecedented access to advanced AI technology.
Benchmark comparisons show Nvidia's NVLM-D model performing competitively against AI giants like GPT-4, Claude 3.5, and Llama 3-V, excelling across various visual and language-based tasks.
The NVLM-D-72B model exhibits remarkable versatility, adeptly handling both complex visual and textual inputs. Examples from researchers demonstrate its ability to interpret memes, analyze images, and solve mathematical problems step-by-step.
One standout feature is NVLM-D-72B’s improved performance on text-only tasks following multimodal training. While many models experience a drop in text performance, NVLM-D-72B showed an average accuracy boost of 4.3 points across key text benchmarks.
"Our NVLM-D-1.0-72B model achieves substantial improvements on text-only math and coding benchmarks, enhancing its text-processing capabilities," the researchers highlight, showcasing the advantage of their multimodal approach.