Confound Check — Old Eval vs New (no top-p, temp=0.7, 1024 tokens)
Left: original node1 eval. Right: confound-matched H100 eval. Same params, same prompts.
Old vs New (P3 + NCA)
All 4 New Models