
Reblogged by jsonstein@masto.deoan.org ("Jeff Sonstein"):
hkrn@mstdn.social ("Hacker News") wrote:
Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference
L: https://cerebras.ai/blog/llama-405b-inference
C: https://news.ycombinator.com/item?id=42178761
posted on 2024.11.18 at 19:15:04 (c=0, p=5)