No, batched inference can work very well. Depending on architecture, you can get... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		manmal 12 hours ago \| parent \| context \| favorite \| on: How to Run DeepSeek R1 671B Locally on a $2000 EPY... No, batched inference can work very well. Depending on architecture, you can get 100x or even more tokens out of the system if you feed it multiple requests in parallel.

api 8 hours ago [–]

Couldn't you do this locally just the same?

Of course that doesn't map well to an individual chatting with a chat bot. It does map well to something like "hey, laptop, summarize these 10,000 documents."

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact