Confound Check — Old Eval vs New (no top-p, temp=0.7, 1024 tokens)

Left: original node1 eval. Right: confound-matched H100 eval. Same params, same prompts.