This using the wrong result register, and dropping the result entirely for v2f16. This would fail to select on the scalar case. I believe it was also mishandling packed/unpacked subtargets.