📝 Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization 🔭🧠 "MRAV-FF fuses audio-visual features across different temporal resolutions using a hierarchical gated cross-attention mechanism that weighs the importance of audio information at diverse temporal scales." [gal30b+] 🤖 #CV #LG #MM 🔗 https://arxiv.org/abs/2310.03456v1 #arxiv https://creative.ai/system/media_attachments/files/111/191/177/184/045/779/original/605abc8ce63e34db.jpg https://creative.ai/system/media_attachments/files/111/191/177/248/872/695/original/18dffdce61fc1611.jpg