IBM doubles down on open source AI with new Granite 3.0 models

ibmlogo-gettyimages-503746912 — Ethan Miller/Getty Images

Open source and AI have an uneasy relationship. AI can’t exist without open source, but few companies want to open source their AI programs or large language models (LLM). Except, notably, for IBM, which previously open-sourced its Granite models. Now, Big Blue is doubling down on its open-source AI with the release of its latest Granite AI 3.0 models under the Apache 2.0 license.

IBM has done this using pretraining data from publicly available datasets, such as GitHub Code Clean, Starcoder data, public code repositories, and GitHub issues. And IBM has gone to great lengths to avoid potential copyright or legal problems.

Also: Can AI even be open-source? It’s complicated

Why have other major AI companies not done this? One big reason is that their datasets are filled with copyrighted or other intellectual property-protected data. If they open their data, they also open themselves to lawsuits. For example, News Corp publications such as the Wall Street Journal and the New York Post are suing Perplexity for stealing their content.

The Granite models, by contrast, are LLMs specifically designed for business use cases, with a strong emphasis on programming and software development. IBM claims these new models were trained on three times as much data as the ones released earlier this year. They also come with greater modeling flexibility and support for external variables and rolling forecasts.

In particular, the new Granite 3.0 8B and 2B language models are designed as “workhorse” models for enterprise AI, delivering robust performance for tasks such as Retrieval Augmented Generation (RAG), classification, summarization, entity extraction, and tool use.

These models also come in Instruct and Guardian variants. The first, as the name promises, helps people learn a particular language. Guardian is designed to detect risks in user prompts and AI responses. This is vital because, as security expert Bruce Schindler noted at the Secure Open-Source Software (SOSS) Fusion conference, “prompt injection [attacks] work because I am sending the AI data that it is interpreting as commands” — which can lead to disastrous answers.

Also: Red Hat reveals major enhancements to Red Hat Enterprise Linux AI

The Granite code models range from 3 billion to 34 billion parameters and have been trained on 116 programming languages and 3 to 4 terabytes of tokens, combining extensive code data and natural language datasets. These models are accessible through several platforms, including Hugging Face, GitHub, IBM’s own Watsonx.ai, and Red Hat Enterprise Linux (RHEL) AI. A curated set of the Granite 3.0 models is also available on Ollama and Replicate.

In addition, IBM has released a new version of its Watsonx Code Assistant for application development. There, Granite provides general-purpose coding assistance across languages like C, C++, Go, Java, and Python, with advanced application modernization capabilities for Enterprise Java Applications. Granite’s code capabilities are now accessible through a Visual Studio Code extension, IBM Granite.Code.

Also: How to use ChatGPT to write code: What it does well and what it doesn’t

The Apache 2.0 license allows for both research and commercial use, which is a significant advantage compared to other major LLMs, which may claim to be open source but bind their LLMs with commercial restrictions. The most notable example of this is Meta’s Llama.

By making these models freely available, IBM is lowering barriers to entry for AI development and use. IBM also believes, with reason, that because they’re truly open source, developers and researchers can quickly build upon and improve the models.

IBM also claims these models can deliver performance comparable to much larger and much more expensive models.

Put it all together, and I, for one, am impressed. True, Granite won’t help kids with their homework or write the great AI American novel, but it will help you develop useful programs and AI-based expert systems.

Source link