I check for opengraph's info for preview cards in every url. Thus, I end up hitting the server even for images and videos. When that happens, you also receive the content-type of the file in the HTTP header. I use that to determine how to render the media. I still check extensions first, but that has proven to be problematic (some image servers serve different formats, other urls have query params that affect the discovery and many don't actually offer extensions at all).
Cool. I wasnt sure if there was another trick besides OpenGraph. Thats what we use in Nos, but it’s slow and causes the cells to change size which makes scrolling less comfortable.
The only way to trully solve this is to either have a server that can provide image sizes or download everything before showing on the feed. There are hacks like the imeta tag or nip54, but those depend entirely in the writer client including them, which is not ideal for the receiving client. https://github.com/nostr-protocol/nips/pull/521