This is very, very vulnerable to "black boxing" and abuse. Without naming names, we already have one popular client changing a keyword the dev doesn't like into the poop emoji in every note without asking. Using algorithms / LLMs to categorise without user approval will open far more opportunities for shenanigans. Imagine every mention of Trump being tagged "rape" and "sh1tcoin" by a client algorithm, probably embedded in an LLM via tagged training data.