Given their framing as "predicting the next word", that's a common expectation. The mistake is seeing the interface as the implementation. In a way "predicting the next word" is how people talk as well, so to know how "intelligent" an LLM is we need to see whether they can predict something that is reasonable but not obvious: > My car gets 30 miles to the gallon and currently has 2 gallons of gas. Gas is currently $3/gallon. How much money do I need to drive 100 miles? Let's approach this step-by-step: First, let's calculate how many miles we can drive with the current gas: The car has 2 gallons It gets 30 miles per gallon So, 2 gallons × 30 miles/gallon = 60 miles Now, let's see how many more miles we need to drive: We want to drive 100 miles total We can already drive 60 miles So, we need to drive an additional 100 - 60 = 40 miles Let's calculate how many gallons of gas we need for these extra 40 miles: The car gets 30 miles per gallon So, 40 miles ÷ 30 miles/gallon ≈ 1.33 gallons Now we know how much gas to buy, let's calculate the cost: Gas costs $3 per gallon We need 1.33 gallons So, 1.33 gallons × $3/gallon = $4 Therefore, you need $4 to buy enough gas to drive 100 miles.