📝 One for All: Video Conversation Is Feasible Without Video Instruction Tuning 🔭
"Introduces a novel method, Branching Temporal Adapter (BT-Adapter), for extending image-language pretrained models into the video domain, which serves as a plug-and-use temporal modeling branch alongside the CLIP backbone." [gal30b+] 🤖 #CV
🔗 https://arxiv.org/abs/2309.15785v1 #arxiv
https://creative.ai/system/media_attachments/files/111/151/133/152/651/466/original/4f057a3255966bdf.jpg
https://creative.ai/system/media_attachments/files/111/151/133/211/907/634/original/4189b0f8c6a5a4e1.jpg
https://creative.ai/system/media_attachments/files/111/151/133/271/062/709/original/09b7c05ed6a10796.jpg
https://creative.ai/system/media_attachments/files/111/151/133/327/289/654/original/747b7bd075c79261.jpg