Last login:

Thu, Oct 16, 4:05 AM

1# caponier makes LLM serving faster and more reliable

3vx --profile-model llama-3.1-8b

4# analyzing inference performance...

5pref --cache-predictions --model llama-3.1-8b

6# warming cache for peak traffic...

8# vx & pref coming soon