Olivia Wise/The Examiner
The artificial-intelligence industry — much of which is concentrated in San Francisco and the Bay Area — was hit by a figurative earthquake last month.
But just how damaging the event will prove to be in the long term is still unclear.
The earthquake came in the form of the second of a pair of AI models released over the course of a month, along with an app that quickly jumped to the top of the charts. What made the models so jolting to the industry was that despite being technologically competitive with models developed by industry leaders such as San Francisco-based OpenAI and Anthropic, they were created by a relatively obscure Chinese company called DeepSeek, reportedly for a fraction of the cost and in spite of export restrictions that limited its access to the latest high-powered AI chips.
And unlike OpenAI and Anthropic’s models, which are proprietary and closed, DeepSeek released its models on an open-source and open-weight basis, meaning anyone can download, use and customize them for free.
DeepSeek’s move was “a shot across the bow” of companies such as OpenAI and Anthropic, said Chris Nicholson, a partner with San Francisco’s Page One Ventures, which invests in AI model and application developers.
The aftershock of the release of DeepSeek’s models hit the stock market Monday. Shares of Nvidia, which has seen its sales skyrocket as AI developers have scrambled to buy as many of its chips as they could get their hands on, plunged 17% that day.
Shares of Constellation Energy — which Microsoft convinced to restart one of the nuclear reactors at Three Mile Island to help meet the energy demand it’s seeing in its data centers from AI — fell nearly 21% Monday. Digital Realty, which owns and leases out data-center facilities, which have seen hot demand with the rise of AI, saw its shares drop 9% that day.
Microsoft has invested billions of dollars in OpenAI, while Google has invested heavily in Anthropic and in developing its own AI models — and both also saw their stock prices drop Monday. Many of the companies’ shares bounced back later in the week, but most only modestly.
The stock market has been on a bull run since the fall of 2022, basically around the same time OpenAI released ChatGPT, noted Steve Sosnick, chief strategist at Interactive Brokers, an online stock brokerage. Much of the market gains companies have posted since then have been driven by the excitement about AI, he said.
But that excitement and the run-up in share prices was built on the assumption that AI developers were going to need more and more computing power running in larger and larger numbers of data centers and sucking up more and more power, Sosnick said.
DeepSeek’s release of its models “forced us to rethink that narrative,” he said.
The OpenAI logo is seen displayed on a cell phone with an image on a computer monitor generated by ChatGPT’s Dall-E text-to-image model, Friday, Dec. 8, 2023, in Boston.
Michael Dwyer/Associated Press
Public-market investors who have acquiesced to the huge amounts of money the Big Tech companies are investing in data centers and chips for AI are starting to question that spending, Sosnick said. DeepSeek’s move also raises concerns about whether those companies or even the leading AI-model developers to date will actually be the ones who dominate the industry in the future, he said.
DeepSeek’s apparent ability to develop a competitive AI model at a fraction of the cost of those of companies like OpenAI and Anthropic could open the door for lots more companies with far less funding than the giants to compete, he said.
“The winners might not be who we think it is,” Sosnick said.
That put DeepSeek on the radar screen of AI experts here in the U.S., who quickly recognized it as a leading developer of the technology in China.
But it started to become more widely known in late December, when it released its new model, V3, which offered comparable performance to OpenAI’s GPT 4o and Anthropic’s Claude 3.5 Sonnet. That’s what helped to set off the earthquake that rocked the public markets Monday.
On Jan. 20, the company released R1, a so-called reasoning model. Built on top of V3, it operates similarly to OpenAI’s o3 model, taking time and using “chain of thought” techniques to break down problems to reach more accurate answers than the standard models.
By Monday, DeepSeek’s app, which offers a chatbot similar to ChatGPT that’s powered by the company’s V3 model, had become the most popular free program in Apple’s App Store. That’s the day Nvidia’s share price plunged.
Reuven Cohen, a technology consultant who has been using DeepSeek-V3 since shortly after it was released last month, at home in Oakville, Ontario, Canada on Jan. 15, 2025.
Chloe Ellingson © The New York Times Company
And the expectation has been that the price tag will only go up with the next generation of models, likely into the billions of dollars. Each training run of the model OpenAI has in development can cost as much as $500 million, The Wall Street Journal reported in December.
Stanford University economics Professor Nicholas Bloom, known for his research contends new tech will fuel more flexible work
Mayor’s emergency fentanyl proposal won key committee support Wednesday
From big data to ChatGPT, Rackhouse Venture Capital partner Kevin Novak has seen many hype cycles
But the other shock for many investors and analysts was that DeepSeek was able to develop its models with at best limited access to the latest Nvidia chips. To try to hinder Chinese AI development, the United States under former President Joe Biden put in place export restrictions on Nvidia’s most advanced AI chips.
The restrictions on chip imports forced DeepSeek to take steps to be more efficient in training and running its models, in effect to do more with less, according to AI experts.
Although DeepSeek did invent some new techniques, many of the optimization steps it took were already known to AI researchers in the United States, said James Landay, a professor of computer science at Stanford and co-director of its Human-centered Artificial Intelligence institute.
“What impressed more people was how many [efficiency techniques] they put together in one big system,” he said.
Among AI experts and venture investors who focus on the space, the reaction to DeepSeek’s models has been a little more subdued than that of Wall Street investors.
Many argued that DeepSeek likely spent significantly more than $5.6 million to develop its technology. That figure only accounts for the final training run of V3, said Ben Thopmson, a technology analyst who writes the Stratechery blog. It doesn’t account for previous training runs, the cost of developing prior models or the cost of the hardware.
“It’s pretty apparent to me that this model didn’t cost $6 million to build, all in,” said Kevin Novak, founder of Rackhouse Venture Capital, which invests in AI startups.
Kevin Novak, founder of Rackhouse Ventures in San Francisco on Friday, Jan. 24, 2025.
Craig Lee/The Examiner
So, the idea that AI developers could develop state-of-the-art models at a small fraction of the billions of dollars that OpenAI, Anthropic or xAI are spending may not really be the case, experts such as Novak said.
And it’s likely that even with DeepSeek’s achievement, costs are going to go up anyway. There has been to date a direct correlation between the size of models and the computing power that goes into creating them on the one hand, and on the other, their intelligence, their ability to answer questions accurately, AI experts said.
The more intelligent a model, the more tasks it will be able to take on, Nicholson said.
“I believe the world’s appetite for intelligence far outstrips the supply,” he said.
Although DeepSeek is offering its model for free and charges developers far less than OpenAI or Anthropic to tap into the version of its model that it hosts, many companies outside China might be reluctant to work with it. DeepSeek’s servers are in China and are thus subject to laws that could force it to turn over personal or corporate data to the Chinese government.
But there’s also concern that there could be code hidden within DeepSeek’s model that could compromise data even when the model is hosted locally in the U.S. While the model is open weight, understanding how exactly AI models work and what they’ll do in particular circumstances is all but impossible, Nicholson said.
“I like to call those a gray box … here’s the code, you tell me what’s going on,” he said.
And, Novak said, the costs of running and even training — at least measured in terms of the intelligence that’s coming out of training — had already been coming down. So, it’s unclear just how much DeepSeek’s model will affect what was already happening in the market.
Still, Novak and others said they think DeepSeek’s moves will affect the industry. Especially because it released its models openly and discussed its research, its efficiency techniques will likely be widely adopted by other developers, they said.
And the fact that it was able to build a competitive model with fewer resources will add to questions that were already being raised about whether the strategy of building ever bigger models with ever more computing power is the right way to go, Landay said. It has become increasingly clear that the gains developers are getting from training on more data has decreased over time, he said.
Much of the recent advances in models have come from incorporating techniques such as chain of thought or so-called “mixture of experts,” a technique in which queries are farmed out to submodels with specific areas of expertise, Landay said.
DeepSeek’s ability to build a model efficiently offers hope to other startups, said Sean Foote, a venture capitalist and a professional faculty member at UC Berkeley’s Haas School of Business. Even with its other development costs taken into account, it still likely spent significantly less developing V3 and R1 than its American rivals, AI experts say.
To date, Foote hasn’t invested in any companies developing cutting-edge AI models because the capital those companies require has been way beyond what he can provide, he said. But DeepSeek’s move offers the possibility that other startups might be able to build their own models for far less money.
“Maybe all the assumptions about [large-language] models are wrong, and it doesn’t require that” kind of capital, he said.
If you have a tip about tech, startups or the venture industry, contact Troy Wolverton at twolverton@sfexaminer.com or via text or Signal at 415.515.5594.
Copyright for syndicated content belongs to the linked Source link