AI Suspicion: DeepSeek's Swift Ascent Under Scrutiny

OpenAI suspects that China's DeepSeek AI models, significantly cheaper than Western counterparts, may have been trained using OpenAI data, sparking controversy and market turmoil. The emergence of DeepSeek caused a sharp decline in the stock prices of major AI companies, with Nvidia experiencing its largest-ever single-day loss.

DeepSeek's R1 model, based on the open-source DeepSeek-V3, boasts significantly lower training costs (estimated at $6 million) and computational requirements compared to Western models like ChatGPT. While this claim is disputed, it has raised concerns about the massive investments made by American tech firms in AI. DeepSeek's popularity surged in the U.S. app download charts, fueled by discussions surrounding its cost-effectiveness.

OpenAI and Microsoft are investigating whether DeepSeek violated OpenAI's terms of service by employing "distillation," a technique to train AI models using data extracted from larger models. OpenAI confirmed its awareness of such attempts by Chinese and other companies to replicate leading U.S. AI models and stated its commitment to protecting its intellectual property. David Sacks, President Trump's AI czar, corroborated OpenAI's suspicion, suggesting that DeepSeek's actions involved knowledge distillation from OpenAI models.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.

This situation highlights the irony of OpenAI's accusations, given its own past controversies regarding the use of copyrighted material in training ChatGPT. Critics have pointed out OpenAI's reliance on vast amounts of internet data, raising questions about its own ethical practices.

OpenAI previously acknowledged the impossibility of training large language models without copyrighted material, citing the broad scope of copyright protection. This stance is further underscored by ongoing legal battles, including a lawsuit from the New York Times alleging unlawful use of its content and a separate lawsuit filed by 17 authors. While OpenAI defends its actions as "fair use," these lawsuits highlight the complex and evolving legal landscape surrounding the use of copyrighted material in AI training. The situation is further complicated by a 2018 U.S. Copyright Office ruling that AI-generated art is not eligible for copyright protection.