True
1050;
Score | 113
Albert David Bangura Graduate Teaching and Research Assistant @ Bahcesehir Cyprus University
city Nicosia, Cyprus
279
623
10
12
In Technology โ€ข 1 min read โ€ข

Why Data is the new Gold?

Recently, China released a groundbreaking large language model or chatbot called DeepSeek-R1. According to the paper they published, this model is an improvement of DeepSeek-zero, which was purely trained using reinforcement learning.

Reinforcement learning is a type of machine learning where a model learns by interacting with the environment without using prior data to train it. However, DeepSeek researchers noted that DeepSeek-zero grappled with performance challenges due to the fact that it was only trained with pure reinforcement learning.

To improve DeepSeek-zero, they developed a new training pipeline, which gave birth to DeepSeek-R1, with much improved performance. This involved collecting thousands of cold start data as the starting point for reinforcement learning.

Their aim was to explore the effect of incorporating a small amount of high-quality data as a cold start on model reasoning performance. DeepSeek researchers stated that the cold start data contributed to the boost in performance of DeepSeek-R1, which backs the statement that โ€œ๐™™๐™–๐™ฉ๐™– ๐™ž๐™จ ๐™ฉ๐™๐™š ๐™ฃ๐™š๐™ฌ ๐™œ๐™ค๐™ก๐™™.โ€



Other insights from Albert David Bangura

Insights for you.
68 views
6 upvotes
4 comments
What is TwoCents? ร—
+