Meta Releases Llama 3.2—and Gives Its AI a Voice

Share

Powering Meta AI’s new capabilities is an upgraded version of Llama, Meta’s premier large language model. The free model announced today may also have a broad impact, given how widely the Llama family has been adopted by developers and startups already.

In contrast to OpenAI’s models, Llama can be downloaded and run locally without charge—although there are some restrictions on large-scale commercial use. Llama can also more easily be fine-tuned, or modified with additional training, for specific tasks.

Patrick Wendell, cofounder and VP of engineering at Databricks, a company that hosts AI models including Llama, says many companies are drawn to open models because they allow them to better protect their own data.

Large language models are increasingly becoming “multimodal,” meaning they are trained to handle audio and images as input as well as text. This extends a model’s abilities and allows developers to build new kinds of AI applications on top of it, including so-called AI agents capable of carrying out useful tasks on computers on their behalf. Llama 3.2 should make it easier for developers to build AI agents that can, say, browse the web, perhaps hunting for deals on a particular type of product when given a short description.

“Multimodal models are a big deal because the data people and businesses use is not just text, it can come in many different formats, including images and audio or more specialized formats like protein sequences or financial ledgers,” says Phillip Isola, a professor at MIT. “In the last few years we’ve gone from strong language models to now having models that also work well on images and voices. Each year we are seeing more data modalities become accessible to these systems.”

“With Llama 3.1, Meta showed that open models could finally close the gap with their proprietary counterparts,” says Nathan Benaich, founder and general partner of Air Street Capital, and the author of an influential yearly report on AI. Benaich adds that multimodal models tend to out-perform larger text-only ones. “I’m excited to see how 3.2 shapes up,” he says.

Earlier today, the Allen Institute for AI (Ai2), a research institute in Seattle, released an advanced open source multimodal model called Molmo. Molmo was released under a less restrictive license than Llama, and Ai2 is also releasing details of its training data, which can help researchers and developers experiment with and modify the model.

Meta said today that it would release several sizes of Llama 3.2 with corresponding capabilities. Besides two more powerful instantiations with 11 billion and 90 billion parameters—a measure of a model’s complexity as well as its size—Meta is releasing less capable 1 billion and 3 billion parameter versions designed to work well on portable devices. Meta says these versions have been optimized for ARM-based mobile chips from Qualcomm and MediaTek.

Meta’s AI overhaul comes at a heady time, with tech giants racing to offer the most advanced AI. The company’s decision to release its most prized models for free may give it an edge in providing the foundation for many AI tools and services—especially as companies begin to explore the potential of AI agents.