Oddbean

What I've done in the past was write functions/tools [1] that can interface with the data and write a system prompt that guides the LLM in using it. This would be an agent setup where you have multiple back-and-forth requests to the LLM for each user prompt. In my case I had parquet files on my network and wrote functions that used DuckDB to query them. Then I wrote documentation for the functions and passed them along to the LLM and let it figure out what args to pass. You can also add high-level instructions and advice. LLMs are also decent at using Pandas and Numpy and Matplotlib so consider giving access to those tools as well. I've found that it can be useful to create agents that specialize in different tasks too. For example, an agent that can formulate the queries and fetch the data (possibly storing it somewhere and returning a reference to that location) and then another that specializes in modeling. 1. https://ollama.com/blog/tool-support