Improve Vision Language Model Chain-of-thought Reasoning

Chain-of-thought (CoT) reasoning in vision language
models (VLMs) is crucial for improving
interpretability and trustworthiness. However,
current training recipes often relying on
datasets dominated by short annotations with
minimal rationales. In this work, we show that
training VLM on short answers leads to poor
generalization on reasoning tasks that require
more detailed explanations. To address this limitation,
we propose a two-stage post-training
strategy that extends the usage of short answer
data for enhanced CoT reasoning. First, we
augment short answers with CoT reasoning
generated by GPT-4o, enhancing the VLM’s
CoT capabilities through fine-tuning. Second,
we leverage short answers as outcome rewards
for reinforcement learning. Specifically, short
answers are used as correctness indicators to
construct positive (correct) and negative (incorrect)
pairs from model-generated reasoning
chains. These pairs are then used to calibrate
the model’s reasoning via Direct Preference Optimization.
Our experiments show significant
improvements in CoT reasoning on benchmark
datasets, along with enhanced generalization to
direct answer prediction. This work provides
a critical data resource for VLM CoT training
and demonstrates the effectiveness of outcome
rewards for multimodal models post-training.

† Work done while at Apple
‡ Carnegie Mellon University

Improve Vision Language Model Chain-of-thought Reasoning

Adding support for Google Pay within Android WebView

Google DeepMind and Isomorphic Labs introduce AlphaFold 3 AI model

softbliss

Related Posts

Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]

AI model deciphers the code in proteins that tells them where to go | MIT News

Google Search AI Mode now offers data visualization and charts

Top 7 AWS Services for Machine Learning

🚀 5 Powerful Open Source Projects Backed by Big Tech Companies — and Changing the World of Development | by TechTales | Jun, 2025

Google DeepMind and Isomorphic Labs introduce AlphaFold 3 AI model

Leave a Reply Cancel reply

Premium Content

Oscars Embrace A.I. with Important Restrictions

Embracing AI as a Creative Collaborator

UP Catalyst raises €18 million to advance the EU’s critical raw material production

Browse by Category

Soft Bliss Academy

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?

Improve Vision Language Model Chain-of-thought Reasoning

Adding support for Google Pay within Android WebView

Google DeepMind and Isomorphic Labs introduce AlphaFold 3 AI model

Related Posts

Leave a Reply Cancel reply

Premium Content

Browse by Category

Browse by Tags

Soft Bliss Academy

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?