Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow! Only $2k with no quantization.

  hit between 4.25 to 3.5 TPS (tokens per second) on the Q4 671b full model


I think it is quantised, they actually said no distillation.


I think you're right. The instructions say

  ollama pull deepseek-r1:671b
This will pull down 400GB: https://ollama.com/library/deepseek-r1:671b

But the Huggingface repo has 163 files of ~4.3GB each, so around 700GB: https://huggingface.co/deepseek-ai/DeepSeek-R1/tree/main




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: