Oddbean new post about | logout
 nostr:npub163k0gvm3x8s7qqjnukf6a2jq4nus3s74j7y64pctcgrwy3nmsllswwhljx the problem is the LLM dataset. The input data must be nearly 100% accurate or the AI will learn incorrectly. There are millions of images out the, a huge dataset, but almost all have small mistakes.

Also, how do we input the known good results? Radiology reports, like all dictated reports, have transcription errors that humans can parse but that would likely break the model.

This means every image used for learning must be manually reviewed and typo-corrected.