DeepSeek's distilled new R1 AI model can run on a single GPU

DeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunch (techcrunch.com)
from schizoidman@lemm.ee to technology@lemmy.world on 29 May 23:41
https://lemm.ee/post/65362602

#technology

threaded - newest

blarth@thelemmy.club on 30 May 00:10 next collapse

7b trash model?

vhstape@lemmy.sdf.org on 30 May 00:24 next collapse

the Chinese AI lab also released a smaller, “distilled” version of its new R1, DeepSeek-R1-0528-Qwen3-8B, that DeepSeek claims beats comparably sized models on certain benchmarks

Most models come in 1B, 7-8B, 12-14B, and 27+B parameter variants. According to the docs, they benchmarked the 8B model using an NVIDIA H20 (96 GB VRAM) and got between 144-1198 tokens/sec. Most consumer GPUs probably aren’t going to be able to keep up with

avidamoeba@lemmy.ca on 30 May 00:41 next collapse

It proved sqrt(2) irrational with 40tps on a 3090 here. The 32b R1 did it with 32tps but it thought a lot longer.

vhstape@lemmy.sdf.org on 30 May 01:58 collapse

On my Mac mini running LM Studio, it managed 1702 tokens at 17.19 tok/sec and thought for 1 minute. If accurate, high-performance models were more able to run on consumer hardware, I would use my 3060 as a dedicated inference device

brucethemoose@lemmy.world on 30 May 18:15 collapse

Depends on the quantization.

7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090 (or even a 3060) and you know a little Docker.

TropicalDingdong@lemmy.world on 30 May 00:43 next collapse

Yeah idk. I did some work with deepseek early on. I wasn’t impressed.

HOWEVER…

Some other things they’ve developed like deepsite, holy shit impressive.

double_quack@lemm.ee on 30 May 03:04 collapse

Save me the search, please. What’s deepsite?

TropicalDingdong@lemmy.world on 30 May 04:17 collapse

tmpweb.net/nmS9uRBAENhQ/

Above is what I can do with deepsite by pasting in the first page of your lemmy profile and the prompt:

“This is double_quack, a lemmy user on Lemmy, a new social media platform. Create a cool profile page in a style that they’ll like based on the front page of their lemmy account (pasted in a ctrl + a, ctrl + c, ctrl + v of your profile).”

It not perfect by any stretch of the imagination, but like, its not a bad starting point.

if you want to try it: huggingface.co/spaces/enzostvs/deepsite

double_quack@lemm.ee on 30 May 04:26 collapse

Excuse me… what? Ok, that’s something…

TropicalDingdong@lemmy.world on 30 May 04:30 collapse

Here I’m DM"ing you something. Its very personal, but I want to share it with you and I made it using Deepsite (in part).

double_quack@lemm.ee on 30 May 04:31 next collapse

ADTJ@feddit.uk on 30 May 18:51 collapse

👀

knighthawk0811@lemmy.world on 30 May 00:57 next collapse

it’s distilled so it’s going to be smaller than any non distilled of the same quality

LainTrain@lemmy.dbzer0.com on 30 May 05:23 collapse

I’m genuinely curious what you do that a 7b model is “trash” to you? Like yeah sure a gippity now tends to beat out a mistral 7b but I’m pretty happy with my mistral most of the time if I ever even need ai at all.

LodeMike@lemmy.today on 30 May 01:48 next collapse

So can a lot of other models.

“This load can be towed by a single vehicle”

fogetaboutit@programming.dev on 30 May 05:40 collapse

ew probably still censored.

Mwa@lemm.ee on 30 May 09:46 next collapse

You can self host it right??

fogetaboutit@programming.dev on 30 May 23:58 next collapse

if the model is censored… then what, retraining it? Or doing it from scratch like what open-r1 is doing?

jaschen@lemm.ee on 31 May 01:23 collapse

The self hosted model has hard coded censored content.

T156@lemmy.world on 30 May 10:36 collapse

The censorship only exists on the version they host, which is fair enough. If they’re running it themselves in China, they can’t just break the law.

If you run it yourself, the censorship isn’t there.

MonkderVierte@lemmy.ml on 30 May 11:27 next collapse

Yeah, i think the censoring in the LLM data itself would be pretty vulnerable to circumvention.

jaschen@lemm.ee on 31 May 01:22 collapse

Untrue, I downloaded the vanilla version and it’s hardcoded in.