You can run llama.cpp on CPU. LLM inference doesn’t need any features only GPUs typically have, that’s why it’s possible to make even simpler NPUs that can still run the same models. GPUs just tend to be faster. If the GPU in question is not faster than an equally priced CPU, you should use the CPU (better OS support).
Edit: I looked at a bunch real-world prices and benchmarks, and read the manual from Huawei and my new conclusion is that this is the best product on the market if you want to run a model at modest speed that doesn’t fit in 32GB but does in 96GB. Running multiple in parallel seems to range from unsupported to working poorly, so you should only expect to use one.
Original rest of the comment, made with the assumption that this was slower than it is, but had better drivers: The only benefit to this product over CPU is that you can slot multiple of them and they parallelise without needing to coordinate anything with the OS. It’s also a very linear cost increase as long as you have the PCIe lanes for it. For a home user with enough money for one or two of these, they would be much better served spending the money on a fast CPU and 256GB system RAM.
If not AI, then what use case do you think this serves better?
The point is that the GPU is designed for parallel computation. This happens to be useful for graphics, AI, and any other problem that can be expressed as a lot of independent calculations that can be executed in parallel. It’s a completely different architecture from a traditional CPU. This particular card is meant for running LLM models, and it will do it orders of magnitude faster than running this stuff on a CPU.
300i www.bilibili.com/video/BV15NKJzVEuU/
M4 github.com/itsmostafa/inference-speed-tests
It’s comparable to an M4, maybe a single order of magnitude faster than a ~1000 euro 9960X, at most, not multiple. And if we’re considering the option of buying used, since this is a brand new product and less available in western markets, the CPU-only option with an EPYC and more RAM will probably be a better local LLM computer for the cost of 2 of these and a basic computer.
I agree with your conclusion, but these are LPDDR4X, not DDR4 SDRAM. It’s significantly faster. No fans should also be seen as a positive, since they’re assuming the cards aren’t going to melt. It costs them very little to add visible active cooling to a 1000+ euro product.
interdimensionalmeme@lemmy.ml
on 01 Sep 15:05
collapse
According to this article hardware-corner.net/huawei-atlas-300i-duo-96gb-ll…
This card consists of two processors with a bandwidth of 204GB/s each
Compare that with the RTX 3090 which has 936GB/s bandwidth,
It really negates the extra memory capacity that will heavily bottleneck the processors.
That’s still faster than your expensive RGB XMP gamer RAM DDR5 CPU-only system, and you can depending on what you’re running saturate the buses independently, doubling the speed and matching a 5060 or there about. I disagree that you can categorise the speed as negating the capacity, as they’re different axis. You can run bigger models on this. Smaller models will run faster on a cheaper Nvidia. You aren’t getting 5080 performance and 6x the RAM for the same price, but I don’t think that’s a realistic ask either.
geneva_convenience@lemmy.ml
on 31 Aug 08:29
nextcollapse
For inference only. NVIDIA GPU’s are so big because they can train models. Not just run them. All other GPU’s seem to lack that capacity.
nutbutter@discuss.tchncs.de
on 31 Aug 11:29
nextcollapse
You can train or fine-tune a model on any GPU. Surely, It will be slower, but higher VRAM is better.
geneva_convenience@lemmy.ml
on 31 Aug 19:41
collapse
No. The CUDA training stuff is Nvidia only.
herseycokguzelolacak@lemmy.ml
on 31 Aug 21:25
nextcollapse
Pytorch runs on HIP now.
geneva_convenience@lemmy.ml
on 31 Aug 22:19
collapse
AMD has been lying about that every year since 2019.
Last time I checked it didn’t. And it probably still doesn’t.
People aren’t buying NVIDIA if AMD would work too. The VRAM prices NVIDIA asks are outrageous.
herseycokguzelolacak@lemmy.ml
on 01 Sep 07:26
collapse
I run llama.cpp and PyTorch on MI300s. It works really well.
geneva_convenience@lemmy.ml
on 01 Sep 14:16
collapse
Can you train on it too? I tried Pytorch on AMD once and it was awful. They promised mountains but delivered nothing. Newer activation functions were all broken.
llama.cpp is inference only, for which AMD works great too after converting to ONNX. But training was awful on AMD in the past.
herseycokguzelolacak@lemmy.ml
on 01 Sep 15:19
collapse
We have trained transformers and diffusion models on AMD MI300s, yes.
geneva_convenience@lemmy.ml
on 01 Sep 21:20
collapse
Interesting. So why does NVIDIA still hold such a massive monopoly on the datacenter?
herseycokguzelolacak@lemmy.ml
on 02 Sep 07:27
collapse
It takes a long time for large companies to change their purchases. Many of these datacenter contracts are locked in for years. You can’t just change them overnight.
CUDA is not equivalent to AI training. Nvida offers useful developer tools for using their hardware, but you don’t have to use them. You can train on any GPU or even CPU. The projects you’ve looked at (?) just chose to use CUDA because it was the best fit for what hardware they had on hand, and were able to tolerate the vendor lock-in.
geneva_convenience@lemmy.ml
on 01 Sep 14:15
collapse
I’m not saying you can deploy these in place of Nvidia cards where the tooling is built with Nvidia in mind. I’m saying that if you’re writing code you can do machine learning projects without CUDA, including training.
geneva_convenience@lemmy.ml
on 01 Sep 14:59
collapse
For sure you can work around it. But it’s not optimal and requires additional work most people don’t feel like putting in.
I kinda want an individual consumer-friendly, low-end/mid-end alternative that can run my games and video editing software for very small projects… so far I’m only eyeing the Lisuan G100, which seems to fit that bill…
This seems cool though, other than AI, it could be used for distributed cloud computing or something of that sort
BETYU@moist.catsweat.com
on 01 Sep 13:23
nextcollapse
and of course a Chinese company would never re-badge something and slap there own name on it
like i said before you are the source of saying that im raging so yes you are trying hard just not hard enough or you would find something better to make up. maybe your just talking about cope because your just talking about yourself.
your fake now im loosing my shit before that it was replying means seething somehow when your doing the exact same thing. your projecting very hard about meaningless shit. your talking about shit you yourself are doing that is entertaining.
i think you can picture a lot off shit that never happened but your so eager to make it real. to the point you don't even care how stupid you look now that is entertainment. you are delusional for everyone to see that is fucking funny. so try harder funny man it can only get better from here.
yes exactly you are the one talking about replying not me and you are the one saying i get angry not me. like i said before you are a delusional try hard. and you keep proving it. it can only get better just keep doing it man.
llama and pytorch support it right now. CUDA isn’t available on its own as far as I can tell. I’d like to try one out but the bandwidth seems to be ass. About 25% as fast as a 3090. It’s a really good start for them though.
threaded - newest
Where can I buy this?
Edit: I realized after I commented this was the product page… My bad. It was more of a take my money now scenario
This is literally a product page to buy them
Try the link of the post you’re responding to.
i wonder if the driver to run is compatible with linux.
Why wouldn’t it? (Like I’m thinking why would they support Microsoft, and the only other viable option is FreeBSD)
the world still uses windows heavily so adoption for the end consumer relies on it.
These only work with ARM cpus I think
PCI-E 3.0, DDR4 memory, no drivers, no fans You would be better off any DDR4 CPU with a bunch of ram
When you definitely know the difference between what a CPU and a GPU does.
For 2000$ it “claims” to do 140 TOPS of INT8
When a Intel Core Ultra 7 265K does 33 TOPS of INT8 for 284$
Don’t get me wrong, I would LOVE to buy a chinese GPU at a reasonnable price but this isn’t even price competitive with CPUs let alone GPUs.
Again, completely different purposes here.
Alright, lets compare it to another GPU.
According to this source , the RTX 4070 costs about 500$ and does 466 TOPS of INT8
I dont know if TOPS is a good measurement tho (I dont have any experience with AI benchmarking)
Now go look at the amount of VRAM it has.
You can run llama.cpp on CPU. LLM inference doesn’t need any features only GPUs typically have, that’s why it’s possible to make even simpler NPUs that can still run the same models. GPUs just tend to be faster. If the GPU in question is not faster than an equally priced CPU, you should use the CPU (better OS support).
Edit: I looked at a bunch real-world prices and benchmarks, and read the manual from Huawei and my new conclusion is that this is the best product on the market if you want to run a model at modest speed that doesn’t fit in 32GB but does in 96GB. Running multiple in parallel seems to range from unsupported to working poorly, so you should only expect to use one.
Original rest of the comment, made with the assumption that this was slower than it is, but had better drivers:
The only benefit to this product over CPU is that you can slot multiple of them and they parallelise without needing to coordinate anything with the OS. It’s also a very linear cost increase as long as you have the PCIe lanes for it. For a home user with enough money for one or two of these, they would be much better served spending the money on a fast CPU and 256GB system RAM.If not AI, then what use case do you think this serves better?The point is that the GPU is designed for parallel computation. This happens to be useful for graphics, AI, and any other problem that can be expressed as a lot of independent calculations that can be executed in parallel. It’s a completely different architecture from a traditional CPU. This particular card is meant for running LLM models, and it will do it orders of magnitude faster than running this stuff on a CPU.
300i www.bilibili.com/video/BV15NKJzVEuU/
M4 github.com/itsmostafa/inference-speed-tests
It’s comparable to an M4, maybe a single order of magnitude faster than a ~1000 euro 9960X, at most, not multiple. And if we’re considering the option of buying used, since this is a brand new product and less available in western markets, the CPU-only option with an EPYC and more RAM will probably be a better local LLM computer for the cost of 2 of these and a basic computer.
M4 is a SoC architecture so it’s not directly comparable. It combines multiple chips for CPU and GPU that share memory on a single chip.
I agree with your conclusion, but these are LPDDR4X, not DDR4 SDRAM. It’s significantly faster. No fans should also be seen as a positive, since they’re assuming the cards aren’t going to melt. It costs them very little to add visible active cooling to a 1000+ euro product.
According to this article
hardware-corner.net/huawei-atlas-300i-duo-96gb-ll…
This card consists of two processors with a bandwidth of 204GB/s each Compare that with the RTX 3090 which has 936GB/s bandwidth, It really negates the extra memory capacity that will heavily bottleneck the processors.
That’s still faster than your expensive RGB XMP gamer RAM DDR5 CPU-only system, and you can depending on what you’re running saturate the buses independently, doubling the speed and matching a 5060 or there about. I disagree that you can categorise the speed as negating the capacity, as they’re different axis. You can run bigger models on this. Smaller models will run faster on a cheaper Nvidia. You aren’t getting 5080 performance and 6x the RAM for the same price, but I don’t think that’s a realistic ask either.
For inference only. NVIDIA GPU’s are so big because they can train models. Not just run them. All other GPU’s seem to lack that capacity.
You can train or fine-tune a model on any GPU. Surely, It will be slower, but higher VRAM is better.
No. The CUDA training stuff is Nvidia only.
Pytorch runs on HIP now.
AMD has been lying about that every year since 2019.
Last time I checked it didn’t. And it probably still doesn’t.
People aren’t buying NVIDIA if AMD would work too. The VRAM prices NVIDIA asks are outrageous.
I run llama.cpp and PyTorch on MI300s. It works really well.
Can you train on it too? I tried Pytorch on AMD once and it was awful. They promised mountains but delivered nothing. Newer activation functions were all broken.
llama.cpp is inference only, for which AMD works great too after converting to ONNX. But training was awful on AMD in the past.
We have trained transformers and diffusion models on AMD MI300s, yes.
Interesting. So why does NVIDIA still hold such a massive monopoly on the datacenter?
It takes a long time for large companies to change their purchases. Many of these datacenter contracts are locked in for years. You can’t just change them overnight.
CUDA is not equivalent to AI training. Nvida offers useful developer tools for using their hardware, but you don’t have to use them. You can train on any GPU or even CPU. The projects you’ve looked at (?) just chose to use CUDA because it was the best fit for what hardware they had on hand, and were able to tolerate the vendor lock-in.
CPU yes. GPU no, in my experience.
I’m not saying you can deploy these in place of Nvidia cards where the tooling is built with Nvidia in mind. I’m saying that if you’re writing code you can do machine learning projects without CUDA, including training.
For sure you can work around it. But it’s not optimal and requires additional work most people don’t feel like putting in.
And training them requires a LOT of VRAM, and this is why they do as much as they can to limit VRAM on their gaming cards: better market segmentation.
I kinda want an individual consumer-friendly, low-end/mid-end alternative that can run my games and video editing software for very small projects… so far I’m only eyeing the Lisuan G100, which seems to fit that bill…
This seems cool though, other than AI, it could be used for distributed cloud computing or something of that sort
and of course a Chinese company would never re-badge something and slap there own name on it
that’s some quality cope there
you seem to be projecting real fucking hard mister alibaba. good luck with your new Huawei GPU
aww will you look at that, little wasp is mad 🤣
try harder funny man. using yourself as a source for yourself that is fucking funny.
I don’t need to try harder, you’re raging as it is. Don’t want you to have a aneurysm.
like i said before you are the source of saying that im raging so yes you are trying hard just not hard enough or you would find something better to make up. maybe your just talking about cope because your just talking about yourself.
I love how you can’t help yourself but keep replying here. Keep on seething there little buddy, it’s adorable.
dude who are you kidding you are here just like me. stop being so fake.
I’m just entertained by you losing your shit here. You’re just free entertainment for me.
your fake now im loosing my shit before that it was replying means seething somehow when your doing the exact same thing. your projecting very hard about meaningless shit. your talking about shit you yourself are doing that is entertaining.
so mad, I can just picture you stomping your little feets there 🤣
i think you can picture a lot off shit that never happened but your so eager to make it real. to the point you don't even care how stupid you look now that is entertainment. you are delusional for everyone to see that is fucking funny. so try harder funny man it can only get better from here.
And yet here you still are.
yes exactly you are the one talking about replying not me and you are the one saying i get angry not me. like i said before you are a delusional try hard. and you keep proving it. it can only get better just keep doing it man.
whatever helps you cope kiddo
dance monkey dance thank you
Does anyone know if it can run CUDA code? Because that’s the silver bullet ensuring Nvidia dominance in the planet-wrecking servers
llama and pytorch support it right now. CUDA isn’t available on its own as far as I can tell. I’d like to try one out but the bandwidth seems to be ass. About 25% as fast as a 3090. It’s a really good start for them though.