dysfun@treehouse.systems ("gaytabase") wrote:
on the other hand, my token rate is much slower when i do it through llama.cpp's ui myself. presumably this is because they have filled the context buffer with some instructions for agents to ignore.
dysfun@treehouse.systems ("gaytabase") wrote:
on the other hand, my token rate is much slower when i do it through llama.cpp's ui myself. presumably this is because they have filled the context buffer with some instructions for agents to ignore.