The AI systems described above are all ‘narrow;’ they are powerful in specific domains, but they can’t do most tasks that humans can. Nonetheless, narrow AI systems present serious risks as well as benefits. They can be designed to cause enormous harm – lethal autonomous weapons are one example – or they can be intentionally misused or have harmful unintended effects, for example due to algorithmic bias.
It seems likely that at some point, ‘transformative AI’ will be developed. This phrase refers to AI that ‘precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.’ One way this could happen is if researchers develop ‘artificial general intelligence;’ AI that is at least as intelligent as humans across all domains. AGI could radically transform the world for the better and help tackle humanity’s most important problems. However, it could also do enormous harm, even threatening our survival, if it doesn’t act in alignment with human interests.
Work on making sure transformative AI is beneficial to humanity seems very pressing. Multiple predictions (see here, here and here) suggest that transformative AI is likely within the next few decades, if not sooner. A majority of experts surveyed in 2022 believed there was at least a 5% chance of AI leading to extinction or similarly bad outcomes, while a near majority (48%) believed there was at least a 10% chance. Working on preventing these outcomes also seems very neglected – 80,000 Hours estimates that 1,000 times more money is being spent on speeding up the development of transformative AI compared to the money spent on reducing its risks.
AI governance research is one way the development and use of AI could be guided towards more beneficial outcomes. This is research that aims to understand and develop ‘local and global norms, policies, laws, processes, politics and institutions (not just governments) that will affect social outcomes from the development and deployment of AI systems.’ It can include high level questions such as how soon AGI will be developed, how it will affect the geopolitical landscape, and what ideal AI governance would look like. It can also include researching the possible impacts of AI on specific areas such as employment, wealth equality and cybersecurity, and developing specific solutions – such as lab policies to incentivise responsible research practices.
Watch the conference talk below in which Alan Dafoe discusses the space of AI governance for more information.
Existing research
We recommend starting by reading the resources linked below and if interested, applying to some online courses on the topic (e.g. AI Safety Fundamentals: Governance track) where you can learn more about this space in a group with other like-minded people.
Useful concepts and framing
Below we introduce some of the useful concepts and framings to help you get up to speed with this research direction. Note that this content will be more relevant to the AI paradigm that currently seems to get the most traction (deep learning).
AI triad: For many policy and governance use cases, AI can be thought of as a combination of data, compute and algorithmic progress. The current frontier Large Language Models (LLMs) have increased their capabilities a lot by scaling the amount of compute, data and model size.
Alternatively, we can define the Frontier AI in terms of its input: the amount of data and compute that is used for training the model. For example, US executive order from 2023 focuses specifically on models that were “trained using a quantity of computing power greater than 1026 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 1023 integer or floating-point operations” and uses the phrase “dual-use foundation model” to mean “AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters..”, similar to Frontier AI Regulation: Managing Emerging Risks to Public Safety (2023, see Appendix A for further discussion on definitions).
Types of extreme risks: Many people distinguish two broad categories of risks: Misuse and Misalignment. Misuse refers to models being used by malicious actors (individuals, groups or state actors) to cause harm, such as by creating bioweapons, cyberattacks, misinformation spreading, etc… Misalignment refers to models acting against the interests and intentions of their designers and users (usually after escaping human control), which can happen for a variety of reasons, including failure to encode human values correctly into the system, specification gaming, evolution of goals, distributional shift, instrumental tendencies for power seeking, etc. Some authors add more categories of risks such as Structural risks, which depend on how the AI system interacts with larger social, political, and economic forces in society (Zwetsloot and Dafoe, 2019); Risks from models incompetently performing desired tasks (Raji et al., 2022a). Other authors distinguish categories of AI race, where competition pressures nations and corporations to rush the development of AIs and cede control to AI systems; and Organizational risks, where organizations developing and deploying advanced AIs could suffer catastrophic accidents, such as AIs being accidentally leaked to the public or stolen by malicious actors, failing to invest in safety research, lack understanding of how to reliably improve AI safety faster than general AI capabilities, or suppressing internal concerns about AI risks. Yet other authors attempted an “exhaustive taxonomy based on accountability: whose actions led to the risk, were they unified, and were they deliberate? Such a taxonomy may be helpful because it is closely tied to the important questions of where to look for emerging risks and what kinds of policy interventions might be effective. This taxonomy in particular surfaces risks arising from unanticipated interactions of many AI systems, as well as risks from deliberate misuse, for which combined technical and policy solutions are needed.” There are also many less extreme but still important risks such as risks to equity and civil rights, privacy, or economic competition and power concentration.
Dangerous capabilities: as large general-purpose models scale, they gain emerging capabilities which we are not able to predict. Some of these might be dangerous and might significantly increase some of the risks mentioned above (especially misuse and misalignment). Examples of such capabilities are cyber offence, deception, self-proliferation, long-horizon planning, biohazard construction, collusion, etc… Therefore, it is useful to map and define these dangerous capabilities, create a way to measure them and regulatory procedures to ensure that models showing those capabilities are not deployed. For more context and a list of dangerous capabilities see Model evaluation for extreme risks (2023).
Different intervention points: There are many points of intervention that could be targeted by useful policies, ranging mapping and influencing the ecosystem as a whole (e.g. forecasting future developments and their speed, regulating the availability of chips and other resources for building models), to the point before the model is trained (e.g. licencing of models exceeding some size; guidelines for the training), to the training itself (e.g. mapping and auditing for dangerous capabilities), to the deployment phase (e.g. regulating access and ongoing monitoring for dangerous capabilities and model misuse). See many ideas for potential interventions in this paper: Frontier AI Regulation: Managing Emerging Risks to Public Safety (2023) and some others in 12 tentative ideas for US AI policy (2023).
Technical Concepts: backpropagation, neural networks, gradient descent, model size, loss function (e.g. next-token prediction), architectures, transformers, etc… Note that it is often important to understand at least the basics of the technology to be able to design good regulatory mechanisms.
Mapping the field and discussions
Various people tried to map the space of AI governance and discussion about AI safety. These might be useful to read to get a more comprehensive picture of the field and positions some people take towards this issue.
Overview of types of research and work being done:
The longtermist AI governance landscape: a basic overview (2022): Useful contributions to AI governance could be made in various levels of applicability, ranging from very foundational and strategy research to much more applied policy advocacy and implementation. You can also contribute on the meta-level by field-building. Read more including examples in the linked post.
Career Resources on AI Strategy Research by AI Safety Fundamentals (2022) lists various types of research one can do to contribute to AI governance (especially strategy) research which include monitoring the current state of affairs, examining history, examining the feasibility, technical forecasting, Assessing risks, etc… Read more including examples in the linked post.
Overview of arguments people hold towards the issue
AI Risk Discussions: interactive summary of interviews with 97 AI researchers done by Dr. Vael Gates
Advice on how to get started on this research direction
If you’re interested in working on AI governance, it’s useful to build a basic technical knowledge of AI (e.g. machine learning and deep learning) as well as knowledge of technical AI safety specifically. Technical knowledge will both improve your understanding of the issues AI governance aims to address and help you gain legitimacy in the eyes of decision-makers. We recommend exploring some of the resources listed in our profile on human-aligned artificial intelligence.
Advice from Seth Baum, the director of the Global Catastrophic Risk Institute, for students who want to work on global catastrophic risks.
Summer schools and fellowships
If you’re interested in a programme that isn’t currently accepting applications, you can sign up for our newsletter to hear when it opens:
The PIBBSS summer research fellowship is for researchers studying complex and intelligent behaviour in natural and social systems, who want to apply their expertise to AI alignment and governance.
Contributors: This profile was last updated 8/11/2023. Thanks to Rick Korzekwa, Jenny Xiao and Lennart Heim for helpful feedback on the previous version. All mistakes remain our own. Learn more about how we create our profiles.