Oddbean

▲ ▼

 hugging face always has good blog posts too https://huggingface.co/blog/embedding-quantization#retrieval-speed

▲ ▼

 Looks simple enough. I imagine you could even go further with a sparse encoding scheme assuming there are huge gaps of 0 bits, which is probably the case for high dimensional embeddings. https://i.nostr.build/gQtAoY8gxAvtQtNv.jpg

▲ ▼

 curious if it’s being used anywhere yet

▲ ▼

 I hear my laptop's fans start whirring around when its making a response, I wouldn't be surprised if its doing something locally first. Either the encoding process (words to tokens) or the retrieval (finding relevant documents from a project)

▲ ▼

 retrieval maybe? 

btw have you seen https://www.mixedbread.ai/blog/mxbai-embed-large-v1

▲ ▼

 wait nvm encoding should be first no? Since converting words to tokens is usually needed before retrieval unless the retrieval uses pre-computed embedding, maybe it skips straight to that? Idk