For the past twenty years I’ve had the opportunity to be in the frontlines witnessing first-hand the unique challenges and opportunities that we face when deploying AI in high-stakes consequential environments.
Recently we’ve seen an incredible acceleration in our progress towards more capable AI. While this opens new exciting possibilities for AI to benefit humanity, the potential scale of this impact also forces us to think deeply about how to do so responsibly.
I help lead DeepMind’s AI Red team, an interdisciplinary group of researchers that is bringing together new ideas and advances in securing AI-enabled systems.
I received my Ph.D. in 2010 while working at UCF's computer vision lab with professor Mubarak Shah. I then moved to Paris where I worked as a post-doctoral research fellow at WILLOW, an INRIA/ENS project. During my time at INRIA I worked with Ivan Laptev and Josef Sivic.
Within the AI community Red teaming can cover a broad range of topics from social harm to the more traditional cybersecurity definition. The term often refers to a process of probing AI systems and products for the identification of harmful capabilities, outputs, or infrastructural threats.
We build on and employ three types of red teaming techniques to test Gemini for a range of vulnerabilities and social harms.
The advent of more powerful AI systems such as large language models (LLMs) with more general-purpose capabilities has raised expectations that they will have significant societal impacts and create new governance challenges for policymakers. The rapid pace of development adds to the difficulty of managing these challenges. Policymakers will have to grapple with a new generation of AI-related risks, including the potential for AI to be used for malicious purposes, to disrupt or disable critical infrastructure, and to create new and unforeseen threats associated with the emergent capabilities of advanced AI.
Artificial intelligence systems are rapidly being deployed in all sectors of the economy, yet significant research has demonstrated that these systems can be vulnerable to a wide array of attacks. How different are these problems from more common cybersecurity vulnerabilities?
See the recent report produced in collaboration with the Center for Security and Emerging Technology and the Program on Geopolitics, Technology, and Governance at the Stanford Cyber Policy Center. We explore the extent to which AI vulnerabilities can be handled under standard cybersecurity processes, the barriers currently preventing the accurate sharing of information about AI vulnerabilities, legal issues associated with adversarial attacks on AI systems, and potential areas where government support could improve AI vulnerability management and mitigation.
Recently my focus has been on enabling the application of AI in high-stakes consequential environments. From healthcare to national security, recent advances in Artificial Intelligence (AI) can improve how we live our lives, modernize government operations, and increase national security. But these same technologies can create intended and unintended consequences for democratic processes, risks to mission critical systems, and risks to citizens. With rare exceptions, however, the idea of protecting AI systems was an afterthought until recently. Already, my team’s work at MITRE Labs has documented AI systems that can be susceptible to bias in their data, attacks involving evasion, data poisoning, model replication, and the exploitation of software flaws to deceive, manipulate, compromise, and render AI systems ineffective. More needs to be done to defend mitigate bias and defend against such attacks, secure the AI supply chain, and ensure the trustworthiness of AI systems so they perform as intended in mission-critical environments. AI’s potential will only be realized through collaborations that help produce reliable, resilient, fair, interpretable, privacy preserving, and secure technologies.
We are excited to announce MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems). ATLAS is a knowledge base of adversary tactics, techniques, and case studies for machine learning (ML) systems based on real-world observations, demonstrations from ML red teams and security groups, and the state of the possible from academic research. ATLAS is modeled after the MITRE ATT&CK® framework and its tactics and techniques are complementary to those in ATT&CK.
ATLAS enables researchers to navigate the landscape of threats to machine learning systems. ML is increasingly used across a variety of industries. There are a growing number of vulnerabilities in ML, and its use increases the attack surface of existing systems. We developed ATLAS to raise awareness of these threats and present them in a way familiar to security researchers.
Along with Bryce Goodman, Ian Goodfellow and Tim Hwang we are excited to announce a NIPS workshop on machine deception. The workshop seeks to bring together the many technical researchers, policy experts, and social scientists working on different aspects of machine deception into conversation with one another. Our aim is to promote a greater awareness of the state of the research, and spark interdisciplinary collaborations as the field advances.