I Put a Datacenter GPU in My Gaming PC for £200

37 points by puhsu 7 hours ago on lobsters | 9 comments

robalex | 3 hours ago

Really cool approach, I'm curious about the GPU falling off PCIe, but there's so many things it could be.

The loud GPU fan reminds me of my time on the CUDA team at NVIDIA. My co-worker was adding the fan control feature to NVML and nvidia-smi. Over the cube wall I heard a fan spinning up and down then he popped up with a giant grin on his face. He said it was his favorite feature to work on since the moment he had the code working he could hear the results.

lor_louis | 4 hours ago

If anyone is interested in self hosted LLMs, dell OEM rtx 3090s are generally cheaper than the big name brand variants and I was able to get my hands on one for ~800$ CAD.

Now I need to read up more on how vllm works because the model sometimes starts spewing long lists of related names and adjectives, I've probably messed something up.

msfjarvis | 3 hours ago

What kind of models are you running on a 3090? I was under the impression that most useful models need at least 48 to 64 gigs of VRAM to run properly, hence the popularity of Apple M-series chips in the space due to their integrated memory design.

ocramz | 3 hours ago

Qwen3.6-27B-MTP quantized at Q5_K_M, which comes in at about 19GB VRAM

and they observe 32 tokens/s inference rate on the V100. So by fitting a model with more bits per weight, the 3090 might even produce better quality (at ~10x the price of the aftermarket datacenter-grade stuff)

msfjarvis | 3 hours ago

Qwen3.6-27B-MTP quantized at Q5_K_M [...] they observe 32 tokens/s inference rate

Wow, that's pretty good. My experience with older Qwen models was much worse but I think I didn't use the right variant since there were so many on Hugging Face. Could I trouble you for a link to the version you're running? Thanks!

ocramz | 3 hours ago

lor_louis | 3 hours ago

I followed unsloths' tutorial to get qwen 3.6 working pretty well, I already had a 3090 for gaming so the second OEM I got for cheap (ish) lets me run K_XL versions of Q5 and I wanted to investigate Q6 and Q8 this weekend

ocramz | 3 hours ago

nelson | an hour ago

oh that is really tempting. I assume this doesn't have the fan hack the post here talks about.