📝 Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation 🔭
"Proposes a spatial-temporal knowledge-embedded transformer (STKET) that incorporates prior spatial-temporal knowledge into the multi-head cross-attention mechanism to learn more representative relationship representations for video scene graph generation (VidSGG)." [gal30b+] 🤖 #CV
⚙️ https://github.com/HCPLab-SYSU/STKET
🔗 https://arxiv.org/abs/2309.13237v1 #arxiv
https://creative.ai/system/media_attachments/files/111/134/440/445/725/010/original/179c0f206035b7d2.jpg
https://creative.ai/system/media_attachments/files/111/134/440/531/529/514/original/4a26c21ea37e4208.jpg
https://creative.ai/system/media_attachments/files/111/134/440/608/648/723/original/2f4f44061ec34a19.jpg
https://creative.ai/system/media_attachments/files/111/134/440/663/199/031/original/99037a793c69633a.jpg