Reblogged by jsonstein@masto.deoan.org ("Jeff Sonstein"):
mpesce@arvr.social ("Mark Pesce says YES") wrote:
If I can sustain the patience to generate around 1 token per second, I can run a 70B parameter local LLM on my "gaming" laptop - the one that I got upgraded to 64GB of RAM a few months back _precisely_ so I could run these large models comfortably.
It's not as fast as GPT-4 - but then it doesn't require an aircraft hanger of servers and huge amounts of cooling. Using the sheep-duck-llama model it's getting close to as accurate on the benchmarks. And my prompts & completions never leave home.