Last year StackOverflow became one of the first websites to announce it would charge AI giants for access to content used to train chatbots. Now the popular Q&A service for coders has signed up its first customerâGoogleâin what CEO Prashanth Chandrasekar says is the start of a âmeaningfulâ new stream of revenue.
The deal is significant, because it remains unclear how broadly Google and other AI developers will pay for content needed for AI projects. Millions of books and websites have fueled the development of AI systems, but most publishers have not been compensated, and some are suing over what they allege is misuse. Many publishers, including StackOverflow, appear threatened by ChatGPT and other generative AI products, which can answer queries that would have previously sent coders their way.
The deal will see Googleâs cloud division use questions and answers from StackOverflow about Google Cloud services to provide coding assistance and technical support through a version of Googleâs Gemini chatbot. Googleâs cloud computing customers will also be able to ask questions through Google Cloudâs command-line interface. âTheir AI may not have all the answers, and so we have a huge ability to help complete that loop,â Chandrasekar says. âWe are the biggest place where community knowledge is curated and validated.â
Gemini will summarize answers drawn from StackOverflow in its own words but include the companyâs logo, a link back to the original material, and the username of the site contributor who supplied it. The companies plan to demonstrate the system at Google Cloud Next, the search companyâs annual cloud conference in April, and launch it soon after.
Chandrasekar says there are no significant restrictions on how Google Cloud can use StackOverflow data, meaning it can be used to train large language models and other AI systems. âWhere we want to stand firm on isânonnegotiable things for usâ trust, accuracy, quality, and attribution back to the sources of these AI outputs,â he says.
He declined to say how much StackOverflow is being paid by Google for the data. âThis will be a meaningful commercial offering for us in the near term, medium term, and long term,â Chandrasekar says.
Covert Scraping
Google and other AI developers have previously gathered data from StackOverflow and other websites without much notice. As demand for generative AI technologies has surgedâand the valuations of the companies developing them has rocketedâthe websites supplying the foundational text have begun demanding what they view as their fair share. Fortunately for StackOverflow, prospective customers have heeded the message, Chandrasekar says. âWe’re not having to chase people,â he says.
StackOverflow data is particularly beneficial to AI systems that generate computer code, which have proven to be popular with software engineers and a significant source of revenue for Microsoft and OpenAI.
The new StackOverflow deal comes just a week after Google reached a licensing agreement to hoover up data from Reddit, the discussion forums operator, whose content has helped chatbotsâ ability to converse. Reddit had unveiled plans to start charging for data access just before StackOverflow had last year.
+ There are no comments
Add yours