"we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model".
Holy fuck.
nostr:nevent1qqsya5ldfzugypnewnhly3akjm7e4ju8pl807w92tz9jecrkxjfwvaspzpmhxue69uhkummnw3ezuamfdejsygyzxs0cs2mw40xjhfl3a7g24ktpeur54u2mnm6y5z0e6250h7lx5gpsgqqqqqqs6ee2rg
I understand some of these words.
Every frame of the video is produced at the same time, not one after the other
Excellent temporal consistency and coherence, but there's no code or model published.