Mastodon Feed: Post

Mastodon Feed

Reblogged by jsonstein@masto.deoan.org ("Jeff Sonstein"):

mpesce@arvr.social ("Mark Pesce says YES") wrote:

If I can sustain the patience to generate around 1 token per second, I can run a 70B parameter local LLM on my "gaming" laptop - the one that I got upgraded to 64GB of RAM a few months back _precisely_ so I could run these large models comfortably.

It's not as fast as GPT-4 - but then it doesn't require an aircraft hanger of servers and huge amounts of cooling. Using the sheep-duck-llama model it's getting close to as accurate on the benchmarks. And my prompts & completions never leave home.