Oddbean new post about | logout
 📝 Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts 🔭

"Introduces a new parameter-efficient approach for vision-language tasks called VITIS that combines multimodal prompt learning and a transformer-based mapping network, while keeping the pretrained models frozen." [gal30b+] 🤖 #CV

🔗 https://arxiv.org/abs/2309.15915v1 #arxiv

https://creative.ai/system/media_attachments/files/111/152/940/113/001/536/original/98e0f07f1add44a8.jpg

https://creative.ai/system/media_attachments/files/111/152/940/168/578/302/original/915bf1afcc42be59.jpg