DeepSeek's surprisingly cost-effective AI model challenges industry giants. Initially touted as costing only $6 million to train, DeepSeek V3, a powerful neural network, has become a major competitor, even causing significant stock drops for NVIDIA. However, the true cost is far higher.
Image: ensigame.com
DeepSeek's success stems from a combination of innovative technologies: Multi-token Prediction (MTP) for improved accuracy and efficiency; Mixture of Experts (MoE) utilizing 256 neural networks for accelerated training; and Multi-head Latent Attention (MLA) for enhanced information extraction.
Image: ensigame.com
Contrary to initial claims, SemiAnalysis revealed DeepSeek's substantial infrastructure: approximately 50,000 Nvidia GPUs, valued at around $1.6 billion, with operational costs reaching $944 million. This contrasts sharply with the publicized $6 million pre-training cost, which omits research, refinement, data processing, and overall infrastructure expenses.
Image: ensigame.com
DeepSeek's unique structure, a subsidiary of High-Flyer, a Chinese hedge fund, allows for swift innovation and decision-making. Owning its data centers provides complete control over optimization. The company's substantial investment exceeding $500 million, coupled with high salaries attracting top Chinese talent (over $1.3 million annually for some researchers), contributes significantly to its competitive edge.
Image: ensigame.com
While DeepSeek's "budget-friendly" narrative is arguably inflated, its success highlights the potential of well-funded independent AI companies. The stark contrast in training costs – DeepSeek's $5 million for R1 versus ChatGPT's $100 million for 4o – underscores DeepSeek's relative cost-effectiveness, even with its substantial actual investment. The company’s success story, however, is more accurately attributed to significant investment, technological advancements, and a highly skilled workforce.