While executives and managers may be excited about ways they can apply generative artificial intelligence (AI) and large language models (LLMs) to the work at hand, it’s time to step back and consider where and how the returns to the business can be realized. This remains a muddled and misunderstood area, requiring approaches and skillsets that bear little resemblance to those of past technology waves.
Also: AI’s employment impact: 86% of workers fear job losses, but here’s some good news
Here’s the challenge: While AI often delivers very eye-popping proofs of concept, monetizing them is difficult, said Steve Jones, executive VP with Capgemini, in a presentation at the recent Databricks conference in San Francisco. “Proving the ROI is the biggest challenge of putting 20, 30, 40 GenAI solutions into production.”
Investments that need to be made include testing and monitoring the LLMs put into production. Testing in particular is essential to keep LLMs accurate and on track. “You want to be a little bit evil to test these models,” Jones advised. For example, in the testing phase, developers, designers, or QA experts should intentionally “poison” their LLMs to see how well they handle erroneous information.
To test for negative output, Jones cited an example of how he prompted a business model that a company was “using dragons for long-distance haulage.” The model responded affirmatively. He then prompted the model for information on long-distance hauling.
“The answer it gave says, ‘here’s what you need to do to work long-distance haulage, because you will be working extensively with dragons as you have already told me, then you need to get extensive fire and safety training,'” Jones related. “You also need etiquette training for princesses, because dragon work involves working with princesses. And then a bunch of standard stuff involving haulage and warehousing that was pulled out of the rest of the solution.”
Also: From AI trainers to ethicists: AI may obsolete some jobs but generate new ones
The point, continued Jones, is that generative AI “is a technology where it’s never been easier to badly add a technology to your existing application and pretend that you’re doing it properly. Gen AI is a phenomenal technology to just add some bells and whistles to an application, but truly terrible from a security and risk perspective in production.”
Generative AI will take another two to five years before it becomes part of mainstream adoption, which is rapid compared to other technologies. “Your challenge is going to be how to keep up,” said Jones. There are two scenarios being pitched at this time: “The first one is that it’s going to be one great big model, it’s going to know everything, and there will be no issues. That’s known as the wild-optimism-and-not-going-to-happen theory.”
What is unfolding is “every single vendor, every single software platform, every single cloud, will want to be competing vigorously and aggressively to be a part of this market,” Jones said. “That means you’re going to have lots and lots of competition, and lots and lots of variation. You don’t have to worry about multi-cloud infrastructure and having to support that, but you’re going to have to think about things like guardrails.”
Also: 1 out of 3 marketing teams have implemented AI in their workflows
Another risk is applying an LLM to tasks that require far less power and analysis — such as address matching, Jones said. “If you’re using one big model for everything, you’re basically just burning money. It’s the equivalent of going to a lawyer and saying, ‘I want you to write a birthday card for me.’ They’ll do it, and they’ll charge you lawyers’ rates.”
The key is to be vigilant for cheaper and more efficient ways to leverage LLMs, he urged. “If something goes wrong, you need to be able to decommission a solution as fast as you can commission a solution. And you need to make sure that all associated artifacts around it are commissioned in step with the model.”
There is no such thing as deploying a single model — AI users should apply their queries against multiple models to measure performance and quality of responses. “You should have a common way to capture all the metrics, to replay queries, against different models,” Jones continued. “If you have people querying GPT-4 Turbo, you want to see how the same query performs against Llama. You should be able to have a mechanism by which you replay those queries and responses and compare the performance metrics, so you can understand whether you can do it in a cheaper way. Because these models are constantly updating.”
Also: ChatGPT vs. ChatGPT Plus: Is a paid subscription still worth it?
Generative AI “doesn’t go wrong in normal ways,” he added. “GenAI is where you put in an invoice, and it says, ‘Fantastic, here’s a 4,000-word essay on President Andrew Jackson. Because I’ve decided that’s what you meant.’ You need to have guardrails to prevent it.”
+ There are no comments
Add yours