Meta releases its first open AI model that can process images

Just two months after releasing its last big AI model, Meta is back with a major update: its first open-source model capable of processing both images and text.

The new model, Llama 3.2, could allow developers to create more advanced AI applications, like augmented reality apps that provide real-time understanding of video, visual search engines that sort images based on content, or document analysis that summarizes long chunks of text for you.

Meta says it’s going to be easy for developers to get the new model up and running. Developers will have to do little except add this “new multimodality and be able to show Llama images and have it communicate,” Ahmad Al-Dahle, vice president of generative AI at Meta, told The Verge.

Other AI developers, including OpenAI and Google, already launched multimodal models last year, so Meta is playing catch-up here. The addition of vision support will also play a key role as Meta continues to build out AI capabilities on hardware like its Ray-Ban Meta glasses.

Llama 3.2 includes two vision models (with 11 billion parameters and 90 billion parameters) and two lightweight text-only models (with 1 billion parameters and 3 billion parameters). The smaller models are designed to work on Qualcomm, MediaTek, and other Arm hardware, with Meta clearly hoping to see them put to use on mobile.

There’s still a place for the (slightly) older Llama 3.1, though: that model, released in July, included a version with 405 billion parameters, which will theoretically be more capable when it comes to generating text.

Alex Heath contributed reporting.

Source link