Last login:

Thu, Oct 16, 4:05 AM
1# caponier makes LLM serving faster and more reliable
2
3vx --profile-model llama-3.1-8b
4# analyzing inference performance...
5pref --cache-predictions --model llama-3.1-8b
6# warming cache for peak traffic...
7
8# vx & pref coming soon
9