Wed. Apr 16th, 2025
????????’? ?? ????? ?????????????: ???????? ???????????? ??? ?? ??????’? ??????????

In recent months, DeepSeek’s AI model, R1, has come under intense scrutiny over allegations that it may have been trained using OpenAI’s proprietary data—potentially without authorization. Several anomalies in DeepSeek’s responses, knowledge cutoffs, and internal reasoning processes suggest that the Chinese AI company might have leveraged OpenAI’s GPT-4 outputs in its development. If proven true, this could be one of the most significant cases of AI model replication in recent history, raising serious ethical and legal questions.


1️⃣ DeepSeek Initially Identified as GPT-4

During early testing, DeepSeek’s chatbot explicitly referred to itself as “a version of ChatGPT based on GPT-4.” This is highly unusual for an AI system that claims to be independent.

  • The phrasing suggests that DeepSeek’s architecture may closely resemble OpenAI’s GPT-4 model.
  • Users reported that DeepSeek mirrored GPT-4’s reasoning structure, response patterns, and knowledge limitations, which would be highly unlikely unless it had been trained using OpenAI’s outputs.
  • The chatbot also referenced OpenAI-specific tools like DALL·E, implying that it had knowledge of OpenAI’s ecosystem, further supporting the theory that it was trained on OpenAI-generated responses.

However, shortly after these findings surfaced, DeepSeek changed its behavior. It stopped identifying as GPT-4 and began branding itself as an independent model, DeepSeek LLM. This abrupt shift raises major red flags about whether the company originally relied on OpenAI’s model and then attempted to cover its tracks.


2️⃣ DeepSeek’s Knowledge Cutoff Inconsistencies (October 2023 & July 2024)

One of the most glaring discrepancies in DeepSeek’s claims is the inconsistency in its knowledge cutoff dates.

  • Some users found that DeepSeek’s chatbot referenced October 2023 as its last training data point, which aligns suspiciously close to OpenAI’s GPT-4 model, which also had a public cutoff around the same time.
  • In later tests, DeepSeek stated that its knowledge extended up to July 2024—which contradicts its previous claims.

The rapid shifts in cutoff dates suggest one of two things:
? Either DeepSeek initially relied on GPT-4 data and later added more training to obscure its origins, or
? It was never transparent about its real data sources in the first place.

This raises the question: Did DeepSeek obtain access to OpenAI’s API responses and use them to train its own model? If so, this could be a major violation of OpenAI’s terms of service.


3️⃣ Microsoft’s Internal Investigation into OpenAI Data Leaks

Adding to the controversy, reports indicate that Microsoft, OpenAI’s largest investor, began investigating unusual API activity in late 2024.

  • It was found that an unknown entity, possibly linked to DeepSeek, extracted large amounts of data from OpenAI’s API over several months.
  • If DeepSeek obtained OpenAI’s GPT-4 outputs in bulk and fine-tuned them into its own model, this would explain the striking similarities in reasoning and knowledge structure.

If OpenAI or Microsoft confirm this unauthorized data usage, DeepSeek could face legal consequences for violating OpenAI’s policies and potentially infringing on its intellectual property.


4️⃣ Why This Matters: AI Ethics and Model Security

If DeepSeek copied OpenAI’s GPT-4 model through distillation or unauthorized API access, it raises several major concerns for the AI industry:

? Data Privacy & Security – How easy is it for AI companies to copy leading models without detection?
? Ethical AI Development – Should AI companies be allowed to train models on competitors’ outputs?
? Regulatory Action – If OpenAI proves unauthorized usage, what legal action could be taken?

For now, DeepSeek continues to insist that its model is independently developed, but the abrupt changes in its self-identification, knowledge cutoffs, and technical capabilities suggest otherwise.


5️⃣ Open Questions: Has OpenAI Been Copied?

As OpenAI and Microsoft continue their investigation, the AI community is left with pressing questions:

Did DeepSeek originally train on GPT-4 data but later try to hide it?
Did OpenAI detect unauthorized API access, forcing DeepSeek to change its responses?
Should OpenAI publicly disclose whether DeepSeek is using its technology without consent?

For now, the truth remains unclear, but if OpenAI confirms unauthorized usage, this could set a major precedent for how AI companies protect their intellectual property in an increasingly competitive industry.


What’s Next?

This case could be a turning point in AI ethics, regulation, and transparency. If OpenAI or Microsoft expose DeepSeek’s methods, it may push for stricter regulations on AI training data and how companies build their models.

The AI world is watching closely. ?