"we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model". Holy fuck. nostr:nevent1qqsya5ldfzugypnewnhly3akjm7e4ju8pl807w92tz9jecrkxjfwvaspzpmhxue69uhkummnw3ezuamfdejsygyzxs0cs2mw40xjhfl3a7g24ktpeur54u2mnm6y5z0e6250h7lx5gpsgqqqqqqs6ee2rg