DeepSeek and Tsinghua University researchers introduce Self-Principled Critique Tuning, a method for training reward models that may guide DeepSeek-R2 toward stronger performance in general domains.
👀DeepSeek Reveals New Training Method Ahead…
DeepSeek and Tsinghua University researchers introduce Self-Principled Critique Tuning, a method for training reward models that may guide DeepSeek-R2 toward stronger performance in general domains.