RL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement Learning
LLMs have gained outstanding reasoning capabilities through reinforcement learning (RL) on correctness rewards. Modern RL algorithms ...
Read more