Ummm, this open source Large Multimodal Model is pretty impressive. Follow the link and click demo to try it. https://llava-vl.github.io/ https://files.mastodon.social/media_attachments/files/111/190/513/211/062/208/original/eecfa58a009bd8d6.png
@1dce721a I've sent it some weird stuff - and it's able to define what makes a humorous juxtaposition in an image. Crazy!