Why is DeepSeek so impressive?

Jan 30, 2025

Intro

Everyone is talking about DeepSeek, and all of us have seen the incredible results it has, but what makes it so special?

Well, the key element is the training procedure. It proves that you can train LLMs 20 times cheaper and achieve even better accuracy.

How is that possible?

Instead of using the classical approach with a very large dataset, it made use of three elements:

Chain of Thought
Reinforcement Learning
Distillation

What is Chain of Thought?

It is a simple prompt engineering technique where we ask the model to think out loud and explain the reasoning step by step. If it makes any reasoning mistakes, we can pinpoint them and let the model retry. Below, you can see an actual example from DeepSeek.

From the original paper, you can see how Chain of Thought is applied to a math problem. It has an “Aha moment” that helped it readjust to the correct answer.

2. Reinforcement Learning

Using Reinforcement Learning in LLM training was another “Aha moment,” this time discovered by researchers. Instead of giving the model the questions and answers, how about letting it discover the answers on its own?

A good example of reinforcement learning is my daughter learning to eat by herself. The objective was simple: to fill her belly. With every iteration, she received a bite of reward when she managed to find a way to get the food to her mouth. In the first attempts, the silicone fork ended up in her ear, then in her nose, and eventually, she achieved the objective and improved bite by bite.

It’s the same with AI models. They receive an objective and small rewards or punishments when they achieve or miss the target. In this case, the objective is defined by the Group Relative Policy Optimization function.

Explaining and understanding it goes beyond our brief read here, but I recommend a great video by Umar Jamil that explains it in detail. All you need to know at the moment is that, instead of receiving a large dataset and being told the correct answers, the DeepSeek team found a way to let the model figure out the answers on its own while measuring how accurate it is. As you can see, it did a pretty good job compared to the previous SOTA OpenAI model.

Distillation

Another impressive result from the DeepSeek R1 paper is the model distillation. This approach is a well-known procedure in AI, which involves training a smaller and specialized Student Model based on a larger Teacher Model.

The large DeepSeek model has 671 billion parameters. Running such a model requires at least 10 H100 GPUs, costing more than $400,000 in infrastructure. To make it more accessible, they used the large LLM to teach a smaller LLM Student with 7 billion parameters, and it performs almost as well as the large LLM.

How it works?

The Student model uses Chain of Thought to answer questions, and at the same time, it sends the same questions to the Teacher model. After both models answer, their responses are compared, and the Student model is rewarded or penalized based on its accuracy.

What is impressive is that, on many benchmarks, the distilled models outperformed many state-of-the-art models in the industry. Imagine running a model with higher accuracy in math and coding than GPT-4 on your MacBook. Isn’t that amazing?

Now it might be a little clearer why DeepSeek created a stock market crash, it showed the industry that a research community can outperform tech giants and make AI models usable for everyone.

Just when we thought that the AI hype comes to an end, we have 2 amazing models OpenAI Operator and DeepSeek coming as a shock in the market. 2025 will be an interesting year!

Why is DeepSeek so impressive?

Intro

How is that possible?

What is Chain of Thought?

2. Reinforcement Learning

Distillation

How it works?