And of course a human can make a wrong call too. In this scenario that’s what is happening. And of course we should bring all of our tools to bear when it comes to evaluating nuclear threats.
But that doesn’t make it less concerning that we’ve now got machines capable of linguistic persuasion in that toolset.
Shall We Play a Game? Language Models for Open-ended Wargames
Wargames are simulations of conflicts in which participants' decisions influence future events. While casual wargaming can be used for entertainment or socialization, serious wargaming is used by experts to explore strategic implications of decision-making and experiential learning. In this paper, we take the position that Artificial Intelligence (AI) systems, such as Language Models (LMs), are rapidly approaching human-expert capability for strategic planning -- and will one day surpass it. Military organizations have begun using LMs to provide insights into the consequences of real-world decisions during _open-ended wargames_ which use natural language to convey actions and outcomes. We argue the ability for AI systems to influence large-scale decisions motivates additional research into the safety, interpretability, and explainability of AI in open-ended wargames. To demonstrate, we conduct a scoping literature review with a curated selection of 100 unclassified studies on AI in wargames, and construct a novel ontology of open-endedness using the creativity afforded to players, adjudicators, and the novelty provided to observers. Drawing from this body of work, we distill a set of practical recommendations and critical safety considerations for deploying AI in open-ended wargames across common domains. We conclude by presenting the community with a set of high-impact open research challenges for future work
This has been everyone's llm problem daily. How is not that clear yet?
I have high hopes for our future.
In the scenario described there literally is a human in the loop: the president is a human?
Anthropic's AI tool Claude central to U.S. campaign in Iran...
How about a nice game of chess?
Not to discount the importance of this risk, but we’re not likely to sleepwalk into it, barring a collapse in strategic & operational competence in planning (yeah, yeah) that would make MANY risks dangerously severe.
"At the Abyss: An Insider's History of the Cold War" recounted that the United States added a Trojan horse to gas pipeline control software that the Soviet Union obtained from a company in Canada. According to the author, when the components were deployed on a Trans-Siberian gas pipeline, the Trojan horse led to a huge explosion. He wrote: "The pipeline software that was to run the pumps, turbines and valves was programmed to go haywire, to reset pump speeds and valve settings to produce pressures far beyond those acceptable to the pipeline joints and welds. The result was the most monumental non-nuclear explosion and fire ever seen from space."