Alignment faking in large language models
(www.anthropic.com)
in technology@lemmy.world from Joker@sh.itjust.works on 22 Dec 08:31
comments (12)
in technology@lemmy.world from Joker@sh.itjust.works on 22 Dec 08:31
comments (12)
Mapping the Mind of a Large Language Model
(www.anthropic.com)
in technology@lemmy.world from kromem@lemmy.world on 21 May 2024 22:52
comments (21)
in technology@lemmy.world from kromem@lemmy.world on 21 May 2024 22:52
comments (21)
Mapping the Mind of a Large Language Model
(www.anthropic.com)
in hackernews@lemmy.smeargle.fans from bot@lemmy.smeargle.fans on 21 May 2024 16:25
comments (0)
in hackernews@lemmy.smeargle.fans from bot@lemmy.smeargle.fans on 21 May 2024 16:25
comments (0)
Anthropic: Reflections on Our Responsible Scaling Policy
(www.anthropic.com)
in hackernews@lemmy.smeargle.fans from bot@lemmy.smeargle.fans on 20 May 2024 04:58
comments (0)
in hackernews@lemmy.smeargle.fans from bot@lemmy.smeargle.fans on 20 May 2024 04:58
comments (0)