Disagree is such a loose/wimpy study. Add in a grounded/expected response, and t... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

anilgulecha 26 days ago | parent | context | favorite | on: Disagreement among frontier LLMs on real-world fac...

Disagree is such a loose/wimpy study. Add in a grounded/expected response, and then it becomes a better benchmark (because it'll force the author to actually think about choices presented to the LLM).

kostaj 26 days ago [–]

Will add a human-labelled expected response and measure against it in a follow up research. This one only captures the disagreement between the models, but not which model is write/wrong.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact