I did some testing with multiple types of models, also with the Llama 1b model. I haven't seen the official benchmarks yet, but improvements are very noticeable even on very small models. System prompts: https://image.nostr.build/88bbf240175554b202f853ba6228453ab8c598494175634fe5ed247c4b927288.jpg nostr:nevent1qqs0mr006vwv866frr6mzheqmmdhlflyv6yptmlvfz249esuzj87fhgpzdmhxue69uhhwmm59e6hg7r09ehkuef0qgsvdac80utfn4gvly4fv54la0l6cp0udpptnm3ezzyajkdc44w53lgrqsqqqqqpr2mdyd