1/27/2025: DeepSeek released R1 (paper) last week and the DeepSeek app has quickly become the most downloaded free app in the U.S., surpassing ChatGPT – this may have contributed to a significant decline in shares of tech giants like Nvidia and Microsoft today.
About two weeks ago, DeepSeek released a new version of its LLM model, DeepSeek v3. It became one of the most cost-efficient model for LLMs in the market and is now at the top of the list of open-source LLM models:
Image source and the related article
Since then, I have used VSCode + Cline + DeepSeek v3 to finish a few coding tasks and found this to be an impressive open-source alternative to Windsurf/Cursor (note that you can have Cline together with Windsurf/Cursor). While Windsurf remains my primary AI coding IDE due to its better overall UX, the ability to deploy DeepSeek v3 locally makes it an excellent choice for organizations prioritizing security and privacy (note that based on this article, the estimated hardware requirements for local deployment of DeepSeek v3 would be 32 A100 GPUs - not something small teams can easily afford).
I also found that most people are unaware that DeepSeek’s founder and primary backer is High-Flyer (Chinese: 幻方), one of China’s largest quantitative hedge funds, managing $7 billion in assets and standing out as one of the few AI startups equipped with over 10,000 A100 GPUs.
I then came across an interview (in Chinese) with its highly low-profile founder, Wenfeng Liang, who shared valuable insights on topics such as original AI innovation in China, the cultivation of AI talent, and fostering a culture of innovation and organizational excellence within companies - I started to pay more attention to this company after reading that.
This interview is a great read but I could not find a English version to share with my non-Chinese friends, so I asked my students to help translate it into English.
The original Chinese interview by Waves (暗涌) can be found here.
The English translation is provided below.
In China’s growing AI scene, DeepSeek is the quietest powerhouse among seven leading AI model startups. Despite its low-key presence, it consistently finds ways to make an unforgettable mark.
A year ago, it drew attention as the only non-big-tech company in China with a massive reserve of A100 GPUs, thanks to its parent company High-Flyer Quantitative, a leading quantitative hedge fund. A year later, it has shaken the industry again, becoming the catalyst for China’s AI price war.
Amid the relentless AI frenzy of May, DeepSeek skyrocketed to fame. It released DeepSeek V2, an open-source model offering unprecedented cost-effectiveness: inference costs as low as 1 RMB (about $0.15) per million tokens — about one-seventh of Llama3 70B’s costs and one-seventieth of GPT-4 Turbo’s.
This earned DeepSeek the nickname “the PinDuoDuo of AI” (Temu’s parent company in China), prompting tech giants like ByteDance, Tencent, Baidu, and Alibaba to cut prices in response, igniting a full-blown price war.
Behind this competition lies an overlooked fact: unlike many others burning money on subsidies, DeepSeek is actually profitable.
This profitability stems from groundbreaking architectural innovations. DeepSeek developed MLA (Multi-Head Latent Attention), a new mechanism that reduced memory usage to just 5–13% of the conventional MHA (Multi-Head Attention) architecture. Its proprietary DeepSeekMoESparse structure further slashed computational costs, paving the foundation for its affordability.
In Silicon Valley, it has been described as a “mysterious force from the East.” SemiAnalysis’s lead analyst called the DeepSeek V2 paper “probably the best one this year in terms of information and details shared.” Former OpenAI employee Andrew Carr praised it as “full of pretty amazing nuggets of wisdom”, even applying its training setups to his own models. Jack Clark, co-founder of Anthropic and former policy director at OpenAI, commended DeepSeek’s team as “capable of non-trivial AI development and invention”, predicting that Chinese-made AI models would rival the global significance of drones and electric vehicles.
This recognition is rare for a company outside Silicon Valley, especially in a field dominated by U.S. tech narratives. According to industry insiders, the enthusiasm is fueled by DeepSeek’s innovative architectures — a rarity even among global open-source AI companies.
An AI researcher noted that since the introduction of the Attention architecture, almost no one has successfully modified it, let alone validated changes at scale. “This is the kind of idea that gets shot down before it even reaches the decision-making table because most people lack the confidence to pursue it,” the researcher explained. Meanwhile, Chinese AI startups have rarely ventured into architectural innovation, partly due to a deeply ingrained perception: the U.S. excels at “0-to-1” breakthroughs, while China is better at “1-to-10” application-driven innovation.
Moreover, such endeavors are often seen as economically inefficient—new models inevitably emerge every few months, and following the established path while focusing on applications seems far more practical. Innovating in model structures means forging a path with no guideposts, accepting repeated failures, and shouldering significant time and financial costs.
DeepSeek, however, stands as a bold contrarian. In an industry filled with voices proclaiming that large-model technologies will inevitably converge, where following the status quo is seen as the smarter shortcut, DeepSeek values the lessons and insights gained from “detours.” The company believes that Chinese AI entrepreneurs can contribute not only to application innovation but also to the global wave of technological breakthroughs.
Many of DeepSeek’s choices reflect this contrarian ethos. Among China’s seven leading AI startups, DeepSeek is the only one to abandon the “both-and” strategy. To date, it has remained laser-focused on research and technology without developing consumer-facing applications. It is also the only one to steadfastly pursue an open-source path, forgo commercial considerations, and operate without external funding. This often leaves DeepSeek overlooked at the table of high-stakes competition. Yet, on the flip side, it frequently enjoys organic, grassroots-level promotion within user communities.
How Was DeepSeek Forged? To answer that, we spoke with DeepSeek’s elusive founder, Liang Wenfeng. A behind-the-scenes tech visionary from the High-Flyer Quantitative era, this ‘80s-born entrepreneur has carried his understated style into DeepSeek, working alongside researchers daily to “read papers, write code, and participate in group discussions.”
Unlike many quant fund founders with stints at overseas hedge funds and backgrounds in physics or mathematics, Liang’s roots are entirely local. He graduated from Zhejiang University’s Department of Electronic Engineering with a focus on artificial intelligence.
Industry insiders and DeepSeek researchers describe Liang as a rare talent in China’s AI landscape—someone who combines exceptional infrastructure engineering expertise with cutting-edge model research capabilities, while also excelling at resource coordination. They say he has the ability to “make precise high-level judgments while outperforming frontline researchers on technical details.” Known for his “formidable capacity to learn,” Liang defies conventional leadership stereotypes, coming across not as a boss but as a true geek.
This was a rare and illuminating interview. Liang, a self-professed tech idealist, shared a voice that’s increasingly scarce in China’s tech scene today. He prioritizes “what’s right” over “what’s profitable” and urges us to challenge the inertia of our times by making “original innovation” a core focus.
A year ago, when DeepSeek first entered the stage, we interviewed Liang for the article, “The Mad Genius Behind High-Flyer: The Hidden AI Giant’s Journey into Large Models” (in Chinese). Back then, his rallying cry—“We must embrace ambition madly, but also pursue sincerity madly”—felt like an inspiring motto. A year later, it’s clear that this philosophy has evolved into tangible action.
Waves: The release of DeepSeek V2 ignited a fierce price war in the AI industry. Some have called you the “catfish” of the sector.
Liang: We didn’t intend to be a catfish—it just happened.
Waves: Were you surprised by the reaction?
Liang: Very surprised. We didn’t expect such sensitivity around pricing. We simply followed our pace, calculated the costs, and set a price. Our principle is neither to operate at a loss nor to seek excessive profits. The price reflects a modest margin above costs.
Waves: Five days later, Zhipu AI followed suit, and then ByteDance, Alibaba, Baidu, and Tencent joined in.
Liang: Zhipu AI lowered the price for an entry-level product. Models comparable to ours remain expensive. ByteDance was the first to truly match our pricing for flagship models, triggering a domino effect. Large companies have much higher model costs than we do, so we didn’t expect anyone to incur losses just to compete. It turned into a subsidy war reminiscent of the internet era.
Waves: To outsiders, the price cuts look like a user acquisition strategy. Isn’t this typical of price wars?
Liang: Acquiring users wasn’t our main goal. One reason for the price cut is that our research into next-generation model structures reduced costs. The other reason is that we believe APIs and AI should be affordable and accessible to everyone.
Waves: Most Chinese companies copy Llama’s structure to build applications. Why did you start with model architecture?
Liang: If the goal is applications, reusing Llama’s structure for quick deployment makes sense. But our goal is AGI. That requires foundational research into new model architectures to achieve greater capabilities with limited resources. Scaling up to larger models demands groundwork. Beyond architecture, we’ve invested heavily in data structuring and making models more human-like, all of which are reflected in our releases. Additionally, Llama’s structure is about two generations behind leading global standards in training efficiency and inference costs.
Waves: What’s driving that generational gap?
Liang: First, training efficiency. Chinese models are likely one iteration behind the best global standards in terms of architecture and training dynamics, meaning we need twice the computational power to achieve the same results. Second, data efficiency is another gap—we need twice the training data and compute resources to match global benchmarks. Combined, this results in a 4x computational cost gap, which we’re working to close.
Waves: Most Chinese companies balance between model development and applications. Why is DeepSeek focused solely on research?
Liang: We believe the most critical thing now is to participate in the global wave of innovation. For years, Chinese companies have relied on foreign technical innovations to build applications, but this doesn’t have to be the norm. In this wave, our starting point isn’t to make quick profits but to advance the ecosystem from the forefront of technology.
Waves: Many assume the U.S. excels at technical innovation, while China thrives in application development.
Liang: As China’s economy matures, we must transition from being passengers to contributors. Over the past 30 years, we’ve largely stayed out of fundamental innovation in IT, relying on Moore’s Law to bring better hardware and software every 18 months. But these advancements stem from relentless efforts by Western technical communities, a process we’ve been detached from and often overlook.
Waves: Why did DeepSeek V2 surprise so many in Silicon Valley?
Liang: In the U.S., where innovation happens daily, this is just another model. What surprised them was that a Chinese company joined the game as a contributor rather than a follower.
Waves: In China, though, this approach feels extravagant. Developing large models demands heavy investment, and most prioritize commercialization.
Liang: Innovation is undoubtedly costly, and our “copy-and-adapt” habits are rooted in China’s historical context. But today, both China’s economic scale and the profits of giants like ByteDance and Tencent are globally significant. What we lack isn’t capital but confidence and the ability to organize high-density talent for effective innovation.
Waves: Why do even cash-rich Chinese companies prioritize quick commercialization?
Liang: Over the past 30 years, we’ve emphasized profit and neglected innovation. True innovation isn’t solely profit-driven—it requires curiosity and creativity. We’ve been constrained by past inertia, but this is a transitional phase.
Waves: At the end of the day, DeepSeek is a commercial organization, not a non-profit research institute. You’ve chosen to innovate and share those innovations through open source. How do you build a moat in this scenario? Won’t breakthroughs like your MLA architecture from May quickly be copied by others?
Liang: In disruptive technology, closed-source advantages are fleeting. Even OpenAI, with its closed-source approach, can’t prevent being surpassed. Our value lies in our team’s growth, accumulated know-how, and the culture of innovation we’ve built. Open-sourcing papers and code doesn’t diminish that—it enhances it. Being followed is a source of pride for engineers. Open source is more of a cultural act than a commercial one, and it adds cultural appeal to the company.
Waves: What do you think of market-driven approaches like those advocated by Xiaohu Zhu?
Liang: Zhu’s strategies work for companies aiming for quick profits. But look at America’s most profitable companies—they thrive on deep, long-term investment in technology.
Waves: In the realm of LLMs, pure technical leadership rarely guarantees an absolute edge. What’s the bigger bet you’re making?
Liang: Chinese AI can’t remain in a follower’s role forever. We often say China trails the U.S. by one or two years, but the true gap lies between originality and imitation. Without bridging this, China will remain a perpetual follower. There are certain explorations that simply cannot be avoided. Take NVIDIA’s leadership as an example—it’s not the product of one company’s effort alone but the collective result of an ecosystem built by the Western tech community and industry. They’re able to anticipate the next wave of technological trends because they have a roadmap in hand. For Chinese AI to progress, we need the same kind of ecosystem. A big reason many domestic chip projects fail to gain traction is the absence of a supportive technical community. Instead, they rely on second-hand information. For China to truly lead, someone has to step up and take a position at the cutting edge.
Waves: DeepSeek has an idealistic, open-source ethos reminiscent of OpenAI’s early days. Will you follow the same path to closed source, as both OpenAI and Mistral have done?
Liang: We won’t go closed source. Building a strong technical ecosystem takes precedence.
Waves: Are there plans to raise funding? There’s speculation about High-Flyer spinning off DeepSeek for an IPO. In Silicon Valley, AI startups often end up aligning with big tech players.
Liang: We have no short-term plans for fundraising. Our challenge isn’t funding but restrictions on high-end chip imports.
Waves: Many believe that building AGI and running a quant fund are entirely different endeavors. Quantitative trading thrives in stealth, but AGI seems to demand bold moves and strategic alliances to scale your investments.
Liang: More funding doesn’t necessarily equate to more innovation. Otherwise, big companies would monopolize all progress.
Waves: Why not develop applications? Is it a lack of operational capability?
Liang: We see this as a period for technical innovation, not application breakthroughs. In the long run, we aim to build an ecosystem where the industry directly leverages our technology and outputs. We focus solely on foundational models and cutting-edge innovation, leaving other companies to build B2B and B2C solutions on top of what we create. If this end-to-end industry pipeline materializes, there would be no need for us to venture into applications ourselves.
Waves: Why choose DeepSeek’s APIs over those of a tech giant?
Liang: The future likely lies in specialized roles. Fundamental model development requires continuous innovation, which may fall outside the capability boundaries of large companies.
Waves: Can technology truly differentiate? You’ve said there are no absolute technical secrets.
Liang: While there are no secrets, catching up requires time and resources. NVIDIA’s GPUs, for instance, are replicable in theory, but organizing teams and advancing next-generation tech takes time, creating a wide moat in practice.
Waves: After you cut prices, ByteDance was the first to follow suit, signaling they felt a certain level of pressure. What’s your take on how startups like DeepSeek can compete with big tech giants in new ways?
Liang: Honestly, we don’t really care about that. Cutting prices was something we did along the way, not a primary objective. Providing cloud services isn’t our main focus; our goal remains achieving AGI. I haven’t seen any groundbreaking new approaches to competing with big tech, but they don’t have a decisive edge either. Sure, they have an established user base, but their cash flow business is also a burden, making them vulnerable to disruption at any moment.
Waves: What’s your outlook for the other six AI startups in China working on large models?
Liang: Maybe two or three will survive. Right now, they’re all burning cash. The ones with a clear identity and refined operational strategies have a better chance of making it. Others might undergo major transformations. Valuable innovations won’t just disappear—they’ll resurface in different forms.
Waves: High-Flyer Quantitative was known for its “we do things our own way” approach, rarely engaging in direct comparisons with competitors. What’s your core philosophy when thinking about competition?
Liang: My main consideration is whether something can improve societal efficiency and whether we can find a role in its value chain that aligns with our strengths. As long as the endgame enhances efficiency, it’s valid. A lot of what happens in between is just transitional. Over-focusing on the noise in the middle inevitably leads to confusion.
Waves: Jack Clark of Anthropic called your team a group of “AI wizards” Who are they?
Liang: They’re not enigmatic—just graduates from top universities, some PhD candidates, and young professionals a few years out of school.
Waves: Many AI companies are fixated on recruiting talent from overseas, and there’s a belief that the top 50 experts in this field aren’t working for Chinese companies. Where does your team come from?
Liang: None of the contributors to the V2 model came from overseas; they’re all homegrown talent. It’s possible the top 50 experts in this field aren’t in China, but who’s to say we can’t develop people of that caliber ourselves?
Waves: The MLA innovation has been a standout achievement. I heard the idea came from the personal curiosity of a young researcher. How did it take shape?
Liang: After analyzing the evolution of attention architectures, he had a spark of inspiration to design an alternative. But turning that idea into reality was a long journey. We assembled a team for the project, and it took several months to see it through.
Waves: This kind of breakthrough creativity seems closely tied to your organizational structure, which prioritizes innovation. During your High-Flyer Quantitative days, tasks were rarely assigned top-down. Has exploring a frontier as uncertain as AGI added more management overhead?
Liang: At DeepSeek, everything remains bottom-up. We typically avoid pre-assigning roles, allowing natural division of labor instead. Everyone has a unique background and their own ideas, so there’s no need to push them. When challenges arise, people naturally pull in others for collaboration. Of course, if an idea shows real promise, we will step in top-down to allocate resources and support its development.
Waves: It’s said that DeepSeek is extremely flexible when it comes to allocating resources, both in terms of GPUs and personnel.
Liang: Absolutely. There’s no upper limit on resource allocation here. If someone has an idea, they can access the training cluster without needing approval. Similarly, there’s no rigid hierarchy or cross-departmental barriers—anyone can collaborate with anyone else as long as there’s mutual interest.
Waves: Such a loosely managed system likely depends on hiring people with intrinsic motivation. I’ve heard that you’re particularly good at spotting talent through unconventional criteria, identifying exceptional people who might not shine on traditional metrics.
Liang: Our hiring philosophy is centered on passion and curiosity. Many of our team members have unique and interesting life experiences. Their drive to conduct research often far outweighs their interest in money.
Waves: Transformers were created at Google’s AI Lab, while ChatGPT emerged from OpenAI. What do you think are the differences in the value that big corporate AI labs and startups bring to innovation?
Liang: Labs like Google AI, OpenAI, and even those at Chinese tech giants all contribute tremendous value. That said, the fact that OpenAI ultimately brought ChatGPT to life also has an element of historical serendipity.
Waves: Is innovation largely a matter of serendipity? I noticed that in your office, the row of meeting rooms in the center has doors on both sides that can be freely opened. Your colleagues said this was designed to “leave space for chance.” There’s even a famous story from the creation of Transformers, where someone overheard a discussion by chance, joined in, and helped turn it into a universal framework.
Liang: I believe innovation starts with a mindset. Why does Silicon Valley have such a strong culture of innovation? It begins with daring to try. When ChatGPT came out, there was a widespread lack of confidence in pursuing cutting-edge innovation across China, from investors to big tech. Many thought the gap was too large, so they focused on applications instead. But innovation requires confidence first and foremost. This confidence is often more apparent in younger people.
Waves: You don’t participate in fundraising and rarely make public appearances. Compared to companies actively seeking investments, DeepSeek undoubtedly has less visibility. How do you ensure that DeepSeek becomes the first choice for those building large models?
Liang: Because we’re tackling the hardest problems. For top-tier talent, the most attractive thing is solving the world’s toughest challenges. In fact, top talent in China is often undervalued. The lack of hardcore innovation on a societal level means they don’t have opportunities to be recognized. By working on the hardest problems, we naturally attract them.
Waves: OpenAI’s recent announcements didn’t include GPT-5, leading many to believe the technological curve is visibly slowing down. Some have even started questioning the validity of Scaling Laws. What’s your take?
Liang: We remain optimistic. The industry as a whole seems to be progressing as expected. OpenAI isn’t a god—they can’t always be at the forefront.
Waves: How long do you think it will take to achieve AGI? Before releasing DeepSeek V2, you launched models for code generation and mathematics, and transitioned from dense models to MoE. What are the milestones on your AGI roadmap?
Liang: It could take two years, five years, or ten years. But it will definitely happen within our lifetime. As for the roadmap, even within our company, there’s no unified view. However, we’re betting on three directions: mathematics and code, multimodal systems, and natural language itself. Mathematics and code are natural testing grounds for AGI, similar to Go—they’re closed, verifiable systems that could achieve high levels of intelligence through self-learning. On the other hand, multimodal systems that learn from real-world human experiences may also be essential for AGI. We’re keeping an open mind about all possibilities.
Waves: What do you envision as the ultimate state of large models?
Liang: There will be specialized companies providing foundational models and services, forming a long chain of professional divisions of labor. On top of these, others will meet society’s diverse needs in various ways.
Waves: Over the past year, there have been notable shifts in China’s AI startup scene. For instance, Wang Huiwen, who was very active early last year, stepped back mid-way. Since then, the new entrants have started to show differentiation.
Liang: Wang Huiwen took on all the losses himself to ensure everyone else could walk away unscathed. He made a choice that was personally disadvantageous but beneficial for others. I deeply respect him for that—it shows his integrity.
Waves: Where are you focusing most of your energy now?
Liang: My primary focus is on researching the next generation of large models. There are still so many unresolved challenges.
Waves: Most other AI startups in China are taking a “both-and” approach, working on both research and application, since technological superiority alone isn’t permanent. Capitalizing on the time window to translate tech advantages into products is key. Is DeepSeek’s focus on research a sign that your model capabilities aren’t quite there yet?
Liang: All strategies are products of the previous generation, and they might not hold for the future. Using the business logic of the internet era to discuss future AI profitability is like discussing General Electric and Coca-Cola when Pony Ma was starting Tencent. It’s likely a case of carving a boat to find a sword—it just doesn’t fit anymore.
Waves: High-Flyer Quantitative has always been known for its strong technological and innovative DNA, and its growth seemed relatively smooth. Is that why you’re so optimistic about innovation?
Liang: High-Flyer did boost our confidence in tech-driven innovation to some extent, but it wasn’t all smooth sailing. We went through a long period of accumulation. What people see from High-Flyer’s success after 2015 is just the tip of the iceberg; in reality, it took us 16 years to get there.
Waves: Returning to the topic of original innovation, with the economy slowing down and capital entering a cooling cycle, does that create additional barriers to original, ground-up innovation?
Liang: I don’t think so. The adjustment in China’s industrial structure will increasingly depend on hardcore technological innovation. As people start to realize that the quick profits of the past were often the result of lucky timing, they’ll be more willing to roll up their sleeves and focus on true innovation.
Waves: So you’re optimistic about this shift?
Liang: Yes, I am. I grew up in the 1980s in a small fifth-tier city in Guangdong. My father was an elementary school teacher. Back in the 1990s, opportunities to make money in Guangdong were everywhere. Many parents would come to our house and say things like, “Studying is useless.” But looking back now, those attitudes have changed. Making money isn’t as easy anymore—people can’t even count on driving a taxi as a fallback. A single generation has seen a complete shift in mindset.
Hardcore innovation will only become more prevalent in the future. Right now, it’s hard for society to fully grasp this because the collective mindset still needs to be educated by real-world outcomes. When those driving hardcore innovation achieve widespread success and recognition, the societal narrative will shift. What we need is simply more evidence—and time—for that transformation to take place.