Oddbean

If you’re thinking about running an LLM at home for the first time, here’s my top 4 tips: 1. Try GPT4ALL and/or ollama. These are launchers that help you download and interact with models. GPT4ALL is a GUI, while ollama is a command line program. 2. Current models come in roughly two sizes: 7B and 22B parameters. These are ~4GB and ~40GB respectively, but they can be even bigger. If your GPU has computational capability AND sufficient vRAM, then the models can be run on GPU. If not, they’ll run on CPU, but more slowly. Try a 4GB model to start. 3. Although there are a relatively small number of popular architectures (llama, mistral, etc.), there are lots of variants of models to choose from. Hugging Face (terrible name) is the site to browse for models. 4. “Alignment” is the new word for bias (particularly the philosophical/political kind). A model tweaked to be maximally compliant and unbiased is called “unaligned”. The big mainstream models are “aligned” with the companies that produced them. Find unaligned models if you want less biased results. (I’ve been happy with the Dolphin line of models by Cognitive Computations). Good luck!

I think for that you have to do training. If you’re not specifically running a training operation, then the results are limited by the model’s context window. The context window is the number of tokens it can keep in mind and work on at a time. Current generation models have context windows of about 10k, meaning that anything you or it talked about further back than that is lost. Note that some words are single tokens, but some words require multiple tokens. Also punctuation and white spaces take up tokens as well.

If your graphics card has the capability and sufficient vRAM to fit the model, I believe GPT4ALL and ollama will detect that and use it automatically. My gaming laptop has an NVidia RTX 2090 with 6GB vRAM. GPT4ALL uses it automatically when the model is small enough to fit, and otherwise runs on CPU, in my experience.