What does this have to do with LLMs? The fundamental question is: "What is your training set; how do you verify it; where do you expect it to be predictive?" 3/
LLMs fudge all of this, and so this becomes a "grey goo" problem. LLMs are poisoning their own input set, and this is IMO very likely to produce a spectral line. I think that "Pre-LLM" and "Post-LLM" will be very easy to detect. And for anything that there are actual $$$ attached to: Never connect it to a public model. /end