DeepSeek before V4 release: traits, organization, and Liang Wenfeng's unique goals

金色财经_ · 2026-04-02T11:49:09+00:00

DeepSeek is currently at a turning point. Since the second half of 2025, the DeepSeek members who have clearly left or found new paths include:- Wang Bingxuan, who was recruited by Tencent's Yao Shunyu at the end of last year. He is the core author of DeepSeek LLM (DeepSeek's first-generation large language model) and has since participated in training multiple generations of models.- Wei Haoran, who left around the Chinese New Year. He is the core author of the DeepSeek-OCR series and may join a major tech company.- Guo Daya, who officially resigned recently. He is the core author of DeepSeek-R1 and may join a major tech company.- Ruan Chong, who left earlier in 2025 and entered retirement. He publicly announced in January this year that he joined the autonomous driving startup Yuanrong Qixing; Ruan Chong is from

金色财经_

2026-04-02 11:49:09

DeepSeek is at a turning point. Since the second half of 2025, members who have clearly left and found new placements include:

Wang Bingxuan, who was poached by Tencent’s Yao Shunyu toward the end of last year. He is the core author of DeepSeek LLM (DeepSeek’s first-generation large language model), and since then has taken part in training successive generations of models.
Wei Haoran, who left around the time of the Spring Festival. He is the core author behind the DeepSeek-OCR series, and he may be joining some major tech company.
Guo Dayaya, who formally resigned recently. She is the core author of DeepSeek-R1, and may be joining some major tech company.
And Ruan Chong, who left earlier in 2025 and is now retired. This year in January, he publicly announced that he had joined the autonomous driving startup Yuanrong Qixing. Ruan Chong is a longtime member who joined back in the Fanfang era; he is a key contributor to DeepSeek’s multimodal results such as Janus-Pro.

DeepSeek previously had not raised any funding, so there is no clear company valuation. When other AI companies’ market caps or valuations surged, Liang Wenfeng was trying to find ways to answer team members’ questions: how much is the company actually worth? This directly determines what the value is of the employee stock option agreements they signed.

Starting from the autumn of 2025, Liang Wenfeng also began focusing more on productization and commercialization. DeepSeek already has a product team of a small number of dozens of people, but it has not yet moved into hot application directions like AI programming or general-purpose agents. On the consumer side, it still only has a typical chatbot product.

Liang Wenfeng’s new focus also includes the management scale. DeepSeek’s headcount has surpassed Fanfang, making it the biggest organization he has ever managed.

Covering all these changes is the fact that DeepSeek V4 still has not been officially released.

In fact, around January 2026, a small-parameter version of V4 has already been provided to some open-source framework communities so they can start adaptations. Based on earlier, relatively optimistic expectations, the large-parameter version of V4 was originally likely to be released and open-sourced around mid-February, around the time of the Spring Festival. According to information available, DeepSeek V4 could be released in April.

When some people leave, more people choose to stay. DeepSeek is adjusting—but it still has many traits that do not change.

It is one of the only core AI labs worldwide that is “not in a frenzy.” When core AI developers at China-U.S. companies like Google, OpenAI, xAI, and ByteDance work 70–80 hours a week, most of DeepSeek’s employees leave the office around 6 p.m.–7 p.m. on weekdays, and they do not clock in in the mornings.

Liang Wenfeng believes that the amount of time a person can produce high-quality output in a day is hard to exceed 6–8 hours.

DeepSeek has no clear performance evaluation and no DDL (deadline). This lean organization, with extremely high talent density, still continues “natural division of labor.” Researchers are free to form teams or independently explore new ideas.

“Besides the main line, there are also people at DeepSeek doing some long-term research that may not show results for a full year.” “DeepSeek is a place where people genuinely want to do research. In China, and even globally, you can find some of the best places.” People close to DeepSeek said.

Of course, DeepSeek also has a distinctive trait: it is mysterious. Especially after 2025, besides publicly releasing technical reports, from the founder Liang Wenfeng to the team members collectively “going silent,” it is hard to hear their voices on social media or communities where AI practitioners are active.

In this report, we present DeepSeek’s traits, where it puts its work emphasis, how its organization operates, and the changes happening within this organization of fewer than 200 people. The source of all of this is the unique goal Liang Wenfeng set for DeepSeek.

Liang Wenfeng: Do fewer things, and do them to the extreme

Liang Wenfeng’s AI ambitions predate DeepSeek’s founding in 2023.

In 2016, Hassabis, the proposer of AGI and the founder of DeepMind, assembled a quantitative trading team, trying to help DeepMind generate revenue by having it monetize a business stream independent of Google—but it did not make money.

In the same year, Liang Wenfeng, who graduated from Zhejiang University with both his bachelor’s and master’s degrees, had already been doing quantitative investing for 8 years. He founded Fanfang in 2015. Starting in 2016, he ran deep learning on-the-job trading using GPUs. By the end of 2017, he achieved “AI for almost all trading strategies.” In 2019, he began building Fanfang’s first computing cluster, “Firefly 1,” with 1,100 GPUs.

Also in 2019, Fanfang AI (Fanfang Artificial Intelligence Basic Research Co., Ltd.) was officially registered and established. Now, Ruo Fuli—who is responsible for AI at Xiaomi—and Ruan Chong, who recently joined Yuanrong, both joined Fanfang after this; later, in 2023, they transferred to DeepSeek.

As someone who became financially independent before turning 30, Liang Wenfeng’s life is simple and mysterious.

In the impressions of people around him, he would wear the same outfit for many days. In Hangzhou, he used to live in hotels for a long time. In Beijing, where most DeepSeek R&D personnel are based, he rents an apartment. He is lean, has exercise habits, and the outdoors activities he is known to enjoy include hiking.

Jen-Hsun Huang would invite NVIDIA employees to his home, drink a little wine, chat casually, and show off his sports cars happily. But Liang Wenfeng does not participate in quarterly team-building events, rarely eats meals with members together. Even the year-end big team-building event only shows up when he gives remarks; he does not take part throughout.

In 2022, an employee at Fanfang—“a plain little pig”—personally donated 138 million yuan to a charitable organization. Later, many people speculated that this little pig was Liang Wenfeng. The response from Fanfang staff was: “Employee donations are anonymous, and within the company, we also don’t know the pig’s real identity.”

Within his work scope, Liang Wenfeng only does a few things. He does not do things that most early-stage startup CEOs do, such as fundraising.

In 2023, Liang Wenfeng met with some investors on a limited basis. But according to what we understand, he made an unconventional request: similar to the investment agreements between OpenAI and Microsoft, Liang Wenfeng hoped investors would accept a return cap. After meeting with this round, no institutions invested in DeepSeek.

Over the next two years, China’s large-model fundraising wave surged, with large deals of hundreds of millions of yuan appearing frequently—yet Liang Wenfeng stopped meeting investors and even did not establish new connections. Even when not in the fundraising window, most founders would not refuse the chance to get to know partners from front-line institutions; Liang Wenfeng rejected most such requests.

Liang Wenfeng has devoted almost all his time to the few things he believes should be prioritized, doing them carefully and to the extreme.

One key to DeepSeek’s earlier success was “single-point leverage”—it clearly prioritized the language model above all else and did not pursue popular directions like multimodal generation.

On the chosen main line, Liang Wenfeng goes “hands on,” digging into the details. He learns about algorithms, architecture, Infra, and data from team members with different backgrounds, and he also personally participates in detailed discussions about model and product matters.

Many people who have met Liang Wenfeng mention that he does not have a CEO—or so-called genius—“aura.” Instead, he is more like a researcher. When he talks with people, the topics he discusses the most are concrete technical issues.

Zhang JinJian, founder of Oasis Capital, shared a small story in his article Those Who Came to Life. He asked Minimax founder Yan Junjie: “Is there anyone more focused than you?” Yan Junjie said that once he arranged to have dinner with a friend he had never met before. He arrived early and saw a guy in a T-shirt, thinking he was an assistant. The other person did not introduce himself at first and asked Yan Junjie many technical questions. After half an hour, Yan Junjie asked, “When will Mr. Liang come?” The other person said, “I’m Liang Wenfeng.”

DeepSeek’s organization: flat, cross-functional division of labor, no overtime

Aligned with Liang Wenfeng’s style, DeepSeek’s organization is extremely flat: each stage cross-functions, expansion of scale is cautious, and there is no overtime.

When Fanfang was founded, Liang Wenfeng had partners; DeepSeek does not have a second-in-command. Especially in the research team, there are only two tiers: Liang Wenfeng and other researchers. Liang Wenfeng makes major decisions and takes on the most results.

That part of the research team now has around more than 100 people. It is like a large laboratory. DeepSeek researchers, mostly born around 2000, are used to calling Liang Wenfeng—who was born in 1985—“Boss Liang.” This boss is closer to a mentor: he organizes R&D, coordinates resources, and also does hands-on research, and his name appears as the corresponding author on shared achievements.

Liang Wenfeng himself is most involved in the base model architecture team. After deep discussion with the team, he confirms the architectural blueprint for each generation of base models. This team has a small few dozen people, and they are the main force for pretraining.

Closely related to base model architecture are the Infra and data teams, each with a small few dozen people. The Infra team in some companies is more like an “internal subcontractor” that fulfills algorithmic requirements. But DeepSeek’s Infra team participates in discussions and provides suggestions already in the blueprint stage before model training.

The close collaboration among these modules means DeepSeek’s team boundaries are not so sharply defined, forming “cross-functional division of labor.” This form of cooperation is actually the most suitable for model training: during model experimentation and the blueprint stage, you have to consider both data selection and Infra implementation.

Liang Wenfeng is the detector and binder that ties these different modules together. He attends each team’s meetings to understand the overall progress and bottlenecks. Most of DeepSeek’s weekly meetings are also open to people from other teams, allowing cross-group attendance.

Both the “frontline” style of going into details and the spontaneously formed tight collaboration are hard to achieve in large organizations. So DeepSeek is very cautious about expanding the scale of its core R&D team.

A point that is extremely unusual across global AI circles is that DeepSeek does not work overtime. They do not clock in, they have no clear performance evaluations, and most members leave the office around 6 p.m.–7 p.m. on weekdays. DeepSeek provides employees with some after-work benefits for free, such as sports class instruction and reimbursement for using sports facilities.

Liang Wenfeng believes that the time a person can work on high-quality output each day is hard to exceed 6–8 hours. Under the fatigue caused by overtime, the wrong judgments that result actually waste valuable compute resources, which is not worth it.

In terms of staffing composition, DeepSeek previously almost did not hire laterally; it relied mainly on retaining new graduates and interns. In early 2025, LatePost summarized the 172 researchers (including interns) who had participated in DeepSeek’s three generations of models (LLM, V2, V3 & R1), and found the resumes of 84 among them: more than 70% were undergraduates and master’s students, and more than 70% were younger than 30.

Before V3 and R1, DeepSeek positioned itself in the global large-model first tier with an extremely focused approach: it had about 1/10 the number of employees compared with big companies, and about 1/2 the average working time, powered by very high concentration and focus.

But as there are more and more directions that need to be explored to reach top AI capabilities, it has become increasingly hard to continue maintaining this kind of organizational scale, communication style, and collaborative atmosphere.

Over the past 15 months, DeepSeek keeps doing itself, while the outside world has changed dramatically

After V3 and R1 exploded in early 2025, DeepSeek did not ride the wave to deploy big moves. Instead, it continued to develop along the directions they were focused on. Their already-public results can be roughly grouped into three categories:

First is efficiency optimization: extracting and squeezing GPU compute to the extreme, increasing the intelligence that can be produced per unit of compute. This includes the full training-and-inference Infra set released as open source during a DeepSeek weekly release in early 2025. It covers inference kernels, communication libraries, matrix multiplication libraries, and data processing frameworks. (Note: kernel refers to code that executes the lowest-level computations on GPUs, used to implement core operations such as matrix multiplication.)

Second is continued improvement of the “attention mechanism,” such as NSA (Native Sparse Attention) in early 2025 and subsequent DSA (Dynamic Sparse Attention). Combined with earlier MLA (Multi-Head Latent Attention) from V2, their common goal is to handle longer contexts without significantly increasing compute.

From the DeepSeek-V3.2 updated at the end of September 2025, you can also see that DeepSeek even replaced the underlying operator library’s mainstream CUDA and Triton languages with TileLang. CUDA is the lowest-level language provided by NVIDIA; Triton was open-sourced by OpenAI; TileLang is an open-source project initiated by the Zhi Yang team at Peking University.

Second, there are improvements to the model architecture, such as mHC (popular constraint super-connection) released in early 2026, aimed at improving stability during large-scale training; and Engram, which builds long-term memory outside the model. The wider industry generally believes that mHC will be used in the training of V4.

Third, there are some “non-mainstream” explorations, such as DeepSeek-OCR, which converts text into images and then feeds them into the model. The idea is to let the model understand paragraphs and hierarchy in a way that is closer to how humans “see text,” thereby improving its ability to understand complex documents.

Inside DeepSeek, there are more ongoing attempts of this kind, including continual learning and autonomous learning.

Liang Wenfeng also recruited some advisors with backgrounds in neuroscience and brain science in 2025 to explore learning mechanisms closer to the human brain.

Meanwhile, over the same period, the external AI environment has changed rapidly from 2025 to the present. The two most watched competitive main lines are:

First is agentic models and applications built on coding capabilities. This is the most fiercely contested battlefield between Anthropic and OpenAI right now, forming the matchup of Opus 4.6 vs GPT-5.4, and Claude Code vs Codex. OpenClaw, the small lobster that has become wildly popular since the beginning of the year, is also the newest form of agentic applications.

Second is multimodal generation. This field repeatedly went mainstream due to “magic effects”: OpenAI GPT-4o in spring 2025, Google NanoBanana in autumn, and ByteDance Seedance 2.0 before the Spring Festival of 2026. Video generation is also related to a more frontier direction: “world models.”

DeepSeek first did not put much effort into multimodal generation, because Liang Wenfeng believes multimodal generation is not the mainline of intelligence.

In the agent direction, DeepSeek-V3.2 strengthened agent capabilities, but DeepSeek’s overall iteration frequency was lower than after R1, and it once felt anxious about other “little tigers.”

From early 2025 to now, Zhipu, MiniMax, and Kimi have each updated 5 versions, 4 versions, and 3 versions respectively, focusing on agent or coding reinforcement.

According to OpenRouter data, in the past 30 days (Feb 24–Mar 26), among the top 10 model token consumption of the OpenClaw application called through OpenRouter, 6 models came from China, and DeepSeek-V3.2 ranked 12th. (Note: OpenRouter more reflects usage by individual and small-to-mid developers, and can only be used as a reference for overall token consumption.)

DeepSeek’s goal isn’t the most mainstream; some people leave, and some people stay

DeepSeek’s contrarian stance is related to the AGI goal that Liang Wenfeng shares with it. Besides pursuing the intelligence ceiling of large models, he believes there are two other very important tasks:

First is building large models based on the domestic ecosystem.

DeepSeek will invest in adapting to domestic GPUs to address the real-world constraint that high-performance GPU supply is limited. For example, after updating V3.1 last August, they mentioned that DeepSeek uses UE8M0 FP8—this is a data compression format—“which is designed for the next generation of domestic chips.” The earlier-mentioned work of replacing Triton with the domestic open-source TileLang is also similar: it gives them more initiative at the foundational layer.

When talking with people in the AI industry, Liang Wenfeng also proposed a hypothesis: “Can we use part of the existing compute to achieve all the intelligence we have today?”

Second is “original-style innovation,” going into directions that big companies or other startups would not try and are unwilling to try.

For example, in the second half of 2024, DeepSeek began the Janus series, attempting to unify multimodal understanding and generation. DeepSeek also built the Prover series to explore formal proof. It also did OCR in 2025 and continues to work on continual learning and exploring bio-inspired human brain mechanisms internally.

As a founder, Liang Wenfeng cares not only about the model results themselves, but also about the more fundamental, original discoveries along the path to chasing those results.

But this does not fully match some of the expectations the outside world has for DeepSeek right now: some people hope that every time DeepSeek makes a move, it will be as earth-shattering as R1. That is a bit too much to ask, and it also does not follow technical规律.

Liang Wenfeng may not care about external expectations, but he must deal with internal expectations.

For younger researchers, doing more frontier research also means having to shoulder more uncertainty. A safer path is to continuously participate in the industry’s strongest models, put their names on the technology reports that receive attention, and carry out experiments and exploration supported by abundant GPU resources.

In addition to honor and influence, another attraction for DeepSeek members is the high promise of wealth.

DeepSeek’s total compensation is not low. But outside offers more. Some recruiters told us that competitors have opened “numbers that are hard to refuse,” “it’s not a big deal to double or triple the offer,” and “other companies offer a package with eight digits (counting stock or equity options).”

New developments are that MiniMax and Zhipu have gone public and their stock prices have risen sharply; and the IPO plans for JieStep and Kimi are also on the agenda. This has also led some DeepSeek members to have more questions about the equity options they hold, which do not have an explicit price tag.

Faced with huge offers, more people choose to stay. They recognize Liang Wenfeng’s way of pursuing AGI and are willing to do explorations that are not driven by competition; they are also used to DeepSeek’s relatively relaxed and comfortable research atmosphere.

Some rumors from outside in the recent period are not accurate. While there have been changes in the DeepSeek team, there has not been a mass departure as a group.

“Those who stay still have some ideals,” people close to DeepSeek said. Liang Wenfeng feels that beyond the main line of improving model efficiency and performance, it is necessary to do some directions whose returns are not clear in the short term—because “those companies overseas with more compute, like Google and OpenAI, internally are definitely trying various directions.”

To date, DeepSeek’s relatively small team and the transparency and flat atmosphere since its founding still allow members to naturally divide work. Sometimes a new direction starts simply because three or five people think an idea is good, and then they go do it together.

This aligns with how Liang Wenfeng described things in his 2024 interview with Dark Tide: “We usually don’t front-load task assignments. Each person has their own unique growth experience, and they come with their own ideas, so there’s no need to push them… But when an idea shows potential, we also allocate resources from the top down.”

“DeepSeek is a place where people sincerely want to do research. In China—even in the world, you can find some of the best options,” people close to DeepSeek said.

Change the world—and be changed by the world

DeepSeek’s valuable aspect is its unique understanding and decomposition of the AGI goal, and it is also the reason it faces internal tensions today. Because Liang Wenfeng values ecosystem building and original exploration, which overlaps with—but is not exactly the same as—industry’s usual first priority of “staying the strongest.”

Moreover, as large-model development has progressed to today, the standards for “strength” and “originality” have become increasingly blurred and subjective.

Benchmark scores can no longer fully measure a model’s level. Especially after entering competition among Agentic models, the reach of products and the long-tail usage cases and diversified data it brings have become even more important—precisely the area that DeepSeek, focused on model R&D, had not put much effort into before.

For the upcoming V4, it is highly likely to still be the strongest open-source model, but it will be hard for it to be the kind of “crushing strongest.” Because in today’s world, different scenarios, and different developers and users, have increasingly diverse standards and subjective feelings about what “strong” means.

As for what constitutes original and valuable new exploration, there are always many different opinions. It depends on different researchers’ experience, judgment, and intuition—what people call “technical taste.”

The way to validate taste is through experiments, and the number and scale of experiments are constrained by GPU resources. Compared with peers, DeepSeek does not have as much compute.

Finally, whether it is the ecosystem foundation of large models or exploring directions that other teams may not necessarily try during the pursuit of model performance—what Liang Wenfeng values has very unclear returns.

Frontier research should bear this kind of uncertainty, but it does not fully match the reality that compute resources are limited, nor the outside world’s expectations that DeepSeek can continuously surprise people—even “crush” them.

Liang Wenfeng realized it was time to change. Recently, he has started figuring out company valuation and giving team members more certain expectations.

DeepSeek will also invest more into product work. We reviewed all of the recruitment posts published on social media by a DeepSeek HR from Dec 2024 to now. In the latest postings in mid-March of this year, DeepSeek for the first time mentioned the names of other specific products. It is hiring for the agent direction: “Model Strategy Product Manager”:

Continuously track industry frontiers, be familiar with and have deeply used well-known agents such as Claude Code, OpenClaw, Manus…

Next, you will certainly see more moves by DeepSeek in agent products.

In early 2025, with a generous open-source spirit and the miracle of doing a lot with little—DeepSeek shook China and the world, changed the world as well: it pushed a group of peers to put more effort into the model technology itself, inspired later models such as Kimi K2 and K2-thinking, and also directly spawned some new teams, such as MiroMind supported by Chen Tianqiao’s funding.

Why it was a miracle is because miracles do not happen often—they are low-probability events. In China’s environment that favors competition and talking based on results, the very existence of DeepSeek, daring to pursue unique goals, is a surprising low-probability event.

Someone who has interacted with Liang Wenfeng described him as: “He is especially resistant to noise.”

After R1 became wildly popular in 2025, Liang Wenfeng showed calmness toward being pursued and praised. Now he faces another kind of test: as external competition intensifies, he has to differentiate noise from signal, insist on what should be insisted on, and change what should be changed.

“People who keep their head down and do things may not necessarily be the ones who laugh last in the turbulent market tide, but only with more companies like DeepSeek will Chinese technology have the possibility to move from ‘copying and replicating’ to ‘leading the race.’” A practitioner said.

This is the work of Liang Wenfeng and DeepSeek. For more people who have been shaken by this company, what they can do is actually very simple: take off the fantasy-novel narrative, and use a more grounded mindset to look at a company and technological innovation.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.