Oddbean new post about | logout
 📝 Interpretable Visual Question Answering via Reasoning Supervision 🔭

"Based on a transformer-based architecture that leverages reasoning supervision as a supervisory signal to guide the visual attention to important elements of the scene, without requiring explicit grounding annotations." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.03726v1 #arxiv

https://creative.ai/system/media_attachments/files/111/033/289/534/828/923/original/8e402443917f84ba.jpg

https://creative.ai/system/media_attachments/files/111/033/289/596/815/937/original/c6bcfea6e262b609.jpg

https://creative.ai/system/media_attachments/files/111/033/289/653/372/205/original/dbe43fba37e1fa0f.jpg