1-bit LLM performs similarly to full-precision Transformer LLMs with the same model size and training tokens but is much more efficient in terms of latency, memory, throughput, and energy consumption.

1-bit LLM performs similarly to full-precision Transformer LLMs with the same model size and training tokens but is much more efficient in terms of latency, memory, throughput, and energy consumption. (arxiv.org)
from yogthos@lemmy.ml to technology@lemmy.ml on 28 Feb 2024 12:52
https://lemmy.ml/post/12526144

#technology

threaded - newest

kevlar21@lemm.ee on 28 Feb 2024 13:26 next collapse

Why use lot bit when one bit do trick?

tubbadu@lemmy.kde.social on 28 Feb 2024 14:21 collapse

Bits together weak

QBertReynolds@sh.itjust.works on 28 Feb 2024 15:10 collapse

Says 1-bit then goes on to describe inputs as -1, 0, or 1. That’s 2-bit. Am I missing something here?

will_a113@lemmy.ml on 28 Feb 2024 17:12 collapse

It’s actually 1.58bits weirdly. The addition of 0 here was the significant change/improvement in this experiment. The paper isn’t too dense and has some decent tables that explain things fairly accessibly.