DeepSeek and Tsinghua University researchers introduce Self-Principled Critique Tuning, a method for training reward models that may guide DeepSeek-R2 toward stronger performance in general domains.
Share this post
👀DeepSeek Reveals New Training Method Ahead…
Share this post
DeepSeek and Tsinghua University researchers introduce Self-Principled Critique Tuning, a method for training reward models that may guide DeepSeek-R2 toward stronger performance in general domains.