I find it odd that sglang, vLLM, TRTLLM don't seem to want to publish benchmarks comparing each other. They used to, but now there seems to be some unspoken rule against it.
At least we get comparison against "other OSS engine" this time, but that could be HF's Transformers as well :)
The problem here is both aimed for Day 0 support, both got embargoed preliminary model weights and arch, and I don't think they have access to the other sides embargoed code.
Yet another website where I don't know what they do, so I go to the homepage that has a marketing sentence explaining what they do, and I still don't understand.
Palmik | 8 hours ago
Bechmarks from InferenceX (they do not have apples-to-apples setups to compare the different engines for whatever reason): https://inferencex.semianalysis.com/inference?i_hc=1&g_model...
I find it odd that sglang, vLLM, TRTLLM don't seem to want to publish benchmarks comparing each other. They used to, but now there seems to be some unspoken rule against it.
At least we get comparison against "other OSS engine" this time, but that could be HF's Transformers as well :)
imjonse | 7 hours ago
Palmik | 7 hours ago
Model makers (both open and closed weight) typically publish benchmarks against other models and when they do not, people rightfully call them out.
Including comparison against "other OSS engine" is just not helpful (what if it's a sandbagged baseline like HF Transformers?)
rfoo | 5 hours ago
mirekrusin | 6 hours ago
palata | 22 minutes ago
Something with LLMs, obviously.