📝 Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts 🔭
"Introduces a new parameter-efficient approach for vision-language tasks called VITIS that combines multimodal prompt learning and a transformer-based mapping network, while keeping the pretrained models frozen." [gal30b+] 🤖 #CV
🔗 https://arxiv.org/abs/2309.15915v1 #arxiv
https://creative.ai/system/media_attachments/files/111/152/940/113/001/536/original/98e0f07f1add44a8.jpg
https://creative.ai/system/media_attachments/files/111/152/940/168/578/302/original/915bf1afcc42be59.jpg