Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

crandlecan

Oh wow. That’s truly like playing multidimensional chess

Hackworth

They used finetuning in the research, but you can definitely see this kind of behavior in the course of regular prompting, particularly as the context starts to fill up. (Possibly related to this paper?)

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMsplus-square

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMsplus-square

Technology

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs