Oddbean new post about | logout
 @d46cf433 the problem is the LLM dataset. The input data must be nearly 100% accurate or the AI will learn incorrectly. There are millions of images out the, a huge dataset, but almost all have small mistakes.

Also, how do we input the known good results? Radiology reports, like all dictated reports, have transcription errors that humans can parse but that would likely break the model.

This means every image used for learning must be manually reviewed and typo-corrected.