Gate News message, April 24 — DeepSeek has released the V4 series of open-source models under the MIT License, with weights now available on Hugging Face and ModelScope. The series includes two mixture-of-experts (MoE) models: V4-Pro with 1.6 trillion total parameters and 49 billion activated per token, and V4-Flash with 284 billion total parameters and 13 billion activated per token. Both support a 1 million token context window.

The architecture features three key upgrades: a hybrid attention mechanism combining compressed sparse attention (CSA) and heavily compressed attention (HCA) that significantly reduces long-context overhead—V4-Pro’s inference FLOPs for 1M context is just 27% of V3.2’s, and KV cache (VRAM for storing historical information during inference) is only 10% of V3.2’s; manifold-constrained hyperconnections (mHC) replacing traditional residual connections to enhance cross-layer signal propagation stability; and the Muon optimizer for faster training convergence. Pre-training used over 32 trillion tokens of data.

Post-training employs a two-stage approach: first training domain-specific experts via supervised fine-tuning (SFT) and GRPO reinforcement learning, then merging them into a single model through online distillation. V4-Pro-Max (highest inference mode) claims to be the strongest open-source model with top-tier coding benchmarks and significantly narrowed gaps with closed-source frontier models on reasoning and agent tasks. V4-Flash-Max achieves Pro-level reasoning performance with sufficient compute budget but is limited by parameter scale on pure knowledge and complex agent tasks. Weights are stored in mixed FP4+FP8 precision.

View Source

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

China's IP Office Adds AI, Semiconductors, and Brain-Computer Interfaces to Fast-Track Protection Program

AI Industry News

Gate News message, April 24 — China's National Intellectual Property Administration announced on April 24 that it will establish comprehensive intellectual property protection for emerging technologies through institutional reforms, enhanced services, and expanded applications. The administration wi

GateNews3m ago

US Government Operating Bitcoin Node Without Mining, Admiral Indicates

bitcoin news Geopolitics Regulation & Policy AI Industry News

A senior admiral has indicated that the US Government is actively operating a node on the Bitcoin network while deliberately avoiding participation in mining activities. The disclosure suggests that authorities are taking a more involved role in blockchain infrastructure for oversight and

CryptoFrontier20m ago

Tesla Q1 Revenue Rises 16% to $22.4B; Humanoid Robot Debut Planned for Mid-2026

Stocks AI Industry News

Gate News message, April 24 — Tesla reported first-quarter revenue of US$22.4 billion on April 23, representing a 16% year-over-year increase. The company's global vehicle deliveries exceeded 358,000 units, while production topped 408,000 units. Tesla's Shanghai Gigafactory delivered 213,000 vehicl

GateNews20m ago

SK Hynix Q1 Profit Surges Fivefold to Record on AI Boom, Boosting Employee Bonuses to $878K

Stocks AI Industry News

Gate News message, April 24 — SK Hynix reported a fivefold surge in quarterly operating profit to a record 37.61 trillion won ($32.4 billion) on April 23, driven by soaring demand from artificial intelligence and data centers. The windfall is fueling unprecedented employee bonuses, which analysts pr

GateNews1h ago

DeepSeek V4 Architecture Verified: 3 of 4 Predictions Hit, Engram Module Absent

AI Industry News

Gate News message, April 24 — DeepSeek released the V4 model card today, validating earlier architectural predictions made through analysis of the TileKernels kernel library released yesterday (April 23). According to monitoring by Beating, three core components were confirmed: mHC

GateNews1h ago

SoftBank Plans AI Data Center Battery Plant in Osaka, Targeting Production Within Five Years

AI Industry News

Gate News message, April 24 — SoftBank Corp, the mobile unit of Japan's SoftBank Group, plans to convert part of its factory in Sakai, Osaka into a large battery production line for AI data centers. CEO Junichi Miyakawa is expected to announce the project in May as part of a new five-year plan,

GateNews1h ago

Comment

0/400

No comments