Oddbean new post about | logout
 Hard disagree.  AI can be trained on facts and still produce hallucinations.   Archive.org and good altnews sites like corbettreport.com and unlimitedhangput.com that source their research in the transcript, that's our best bet.  
 You have mistaken LLMs as the only Ai tool. That’s not what I’m talking about. I’m talking about stand alone service provision, vectorizing the internet and massive amounts of data, then being able to contextually search it with tools *entirely independent* of Google. 

This requires a handful of different models, but this was a nightmare a few years ago of you were trying to do it yourself (or impossible really). Today, with open source models and agents to search and collect data, doing this is a vastly more viable challenge. The combination of that and the likelihood of aggregating databases in the open source community I think will be really interesting, especially as the pressure builds to solve this growing problem. 
 It’s just feasible now on commodity GPUs (mid-range gaming ones) to perform semantic analysis and retrieval of documents using a persistent vector database. 

Where this is gaining traction right now is businesses (and individuals) semantically indexing their own private document stores, to be able to search and retrieve on them in a similar way to how Internet search engines do the public Internet on a much larger scale.

A decentralized version of this is conceivable in concept, but there are all the usual issues of trust, authentication, data integrity, etc. that arise in the context of multi-agent systems with bad actors.