Did you check what happens if you edit the NIPs to change the specification? Is the code still correct? It's likely the LLMs can do this because working demos are in their training sets.
this is a good point