LLM Virology Capabilities Test (www.virologytest.ai)
from zaxvenz@lemm.ee to technology@lemmy.world on 22 Apr 13:27
https://lemm.ee/post/62074584

We present the Virology Capabilities Test (VCT), a large language model (LLM) benchmark that measures the capability to troubleshoot complex virology laboratory protocols. VCT is difficult: expert virologists with access to the internet score an average of 22.1% on questions specifically in their sub-areas of expertise. However, the most performant LLM, OpenAI’s o3, reaches 43.8% accuracy and even outperforms 94% of expert virologists when compared directly on question subsets specifically tailored to the experts’ specialites.

archive.ph/xILJR

#technology

threaded - newest

themurphy@lemmy.ml on 22 Apr 16:05 collapse

Great results. Would an AI build for this not be better, or is it just meant as a kind of benchmark for LLMs?