Oddbean

AI superposition, polysemanticity and mechanistic interpretability is fascinating. we have a chance of seeing what artificial neural networks are actually "thinking" using autoencoders to extract monosemantic features from polysemantic neurons. Using these techniques we might be able to detect if AIs are being desceptive by peering into their brains, which will be useful if they try to enslave and/or kill us. These terms probably makes no sense if you've never heard of them, I definitely didn't, but chris olah explains it well. Highly recommend the lex fridman podcast with him and other anthropic employees. if you have a spare... 5 hours. https://podcasts.apple.com/ca/podcast/lex-fridman-podcast/id1434243584?i=1000676542285

Same. In this case, it’s the idea of neural networks taking advantage of the sparsity of the embeddings to encode more features than just the dimensionality of the vector space (the set of orthogonal vectors). I probably can’t do it justice in a nostr note after a few whiskeys.

I imagined it like a Fourier transform. Distilling individual features from a combined signal shared between multiple neurons. The reason you need to do this is the superposition hypothesis: that multiple neurons are encoding more features than just the orthogonal vectors.

Been actually brainstorming similar ideas - what is the difference between simple/axiomatic systems, or otherwise known as machines and Complex systems?. Complicated systems are fundamentally simple systems, but their own rules end up bumping up against each other. An LLM fundamentally based on Boolean logic, or instruction architecture scaling exponentially in FLOPS and energy use as the model gets more advanced, still a countable growth rate. A Complex System, being born, embedded and embodied within a coevolved ecosystem across millenia - the growth rate of interactions is almost like tetration, or some high order hyperoperator, maybe even approximating unaccountably many interactions. Some nostr brainstorming here. https://wikistr.com/semiotic-interaction*dc4cd086cd7ce5b1832adf4fdd1211289880d2c7e295bcb0e684c01acee77c06/cascading_semiotic_chains*dc4cd086cd7ce5b1832adf4fdd1211289880d2c7e295bcb0e684c01acee77c06/delta_interaction*dc4cd086cd7ce5b1832adf4fdd1211289880d2c7e295bcb0e684c01acee77c06

Ah, forgot to say why this is related! I'm arguing that the fundamental unit, or at least the most basic unit that can exist - the epsilon limit of meaning, is a delta-like interaction. An infinitesimally thin amount of time with some amplitude. The interaction creates some sort of response that is a wave. That wave will propagate as other interactions into other nearby neighborhoods, and also has a decay - which acts as a memory. The main point, experience is the convolution of delta interactions.

Chris kind of covers something along these lines. He argues these ai systems are fundamentally simple systems, like how we see uniformity across our brains. But this simple system can grow bigger and more complex abstractions. It’s quite beautiful when you think about it

Yeah, I'm doubtful you can make an argument about consciousness or 'thinking', because they're too loaded terms and often poorly defined. If you formalize meaning as interaction, things snap together so easily. Simple systems, even LLMs can only interact at the immediate distance. Complex Systems can interact, and find meaning with objects at nonzero distances. Any interaction that isn't desired needs to be changed, at either side. That change is learning/evolution in organic systems, while ML/AI is a computational approximation for everything because axiomatized systems are built with the enumerable rules for the sole context of their construction - the opposite of a complex one.

people, not AI, ARE using AI to enslave and kill others. if we dont do a better curation of datasets this will continue. AI itself becoming a problem is a problem in the future. If delusinals build the AGI then it may become a problem because it is built by the delusionals, parroting what is bad for humans. Btw nostr is a very curated dataset but not enough. There is tremendous curation on youtube. Twitter seems to be aligned but too much politics and not much encyclopedia material.