> Proper distillation requires access to the logits Why do you need logits? Can'... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		wren6991 2 days ago \| parent \| context \| favorite \| on: The text in Claude Code’s “Extended Thinking” outp... > Proper distillation requires access to the logits Why do you need logits? Can't you just train on cross-entropy loss of the model against the hard decision, like you do in regular pretraining? There are definitely current-gen open-weight models (Step 3.7 Flash is one) that refer to themselves as an OpenAI model in CoT, but not in the final response.
		help

CamperBob2 2 days ago [–]

How do I get that loss, though, without the softmax inputs?

wren6991 2 days ago | [–]

Do they have logits for all of the Wikipedia etc that they've scraped?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact