Sponsored By

Update: OpenAI Creates Anti-Superintelligence Team

OpenAI's new team aims to solve the superintelligence alignment problem - to prevent it from going rogue. Yann LeCun says it's not that easy.

Deborah Yao

July 5, 2023

3 Min Read

At a Glance

  • OpenAI created a team to tackle the superintelligence alignment problem. Meta's Yann LeCun said it's not that easy.
  • Superintelligence alignment refers to AGI systems that are aligned with human values and follow human intent.
  • OpenAI is dedicating 20% of its compute resources to solving this problem, which it hopes to do in four years.

OpenAI CEO Sam Altman has been going around the world – literally – to tout the dangers of AI superintelligence, a phenomenon where machines become much smarter than humans and could potentially go rogue.

Now, OpenAI said it is creating a new team tasked with finding methods to counteract this superintelligence, which it believes could arrive this decade.

Controlling this superintelligence not only calls for new governance institutions, but it also means solving the problem of superintelligence alignment, according to a company blog post. That is, the team needs to figure out how to make this superintelligence align itself with human values and follow human intent.

“Unaligned AGI (artificial general intelligence or superintelligence) could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together,” the company has said in a 2022 blog post.

OpenAI is calling the new team Superalignment, and it is assembling a group of “top machine learning researchers and engineers” to join it. Leading the team are Ilya Sutskever, OpenAI co-founder and chief scientist, and Jan Leike, head of alignment.

The company pledged to dedicate 20% of the compute it has secured to date to work on this alignment problem. OpenAI expects to solve the “core technical challenges of superintelligence alignment” in four years.

Related:OpenAI, Alphabet CEOs on World Tour to Tout Governance

OpenAI is hiring research engineers, research scientists and research managers to join its Superalignment team.

But Turing Award winner Yann LeCun, who is Meta's chief AI scientist, said it's not that easy.

"One cannot just 'solve the AI alignment problem.' Let alone do it in four years," he tweeted. "One doesn't just 'solve' the safety problem for turbojets, cars, rockets, or human societies, either. Engineering-for-reliability is always a process of continuous & iterative refinement."

View post on Twitter

Attempts to solve alignment

OpenAI has been working on the alignment problem for at least a year and has not yet found a solution.

“Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue,” the company said. “Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.”

The new team’s goal is to build a human-level automated alignment researcher, meaning it plans to build AI systems that can align superintelligent AI systems – and do so faster and better than humans.

To align the first automated alignment researcher, the team needs to develop a scalable training method using AI systems to help evaluate other AI systems; validate the resulting model by doing such things as automating search for problematic behavior; and stress test the alignment pipeline by deliberately training misaligned models to see if these are detected.

Human researchers will then focus their efforts on reviewing the alignment done by AI systems instead of generating this research by themselves. OpenAI has said its goal was to “train models to be so aligned that we can off-load almost all of the cognitive labor required for alignment research.”

This alignment research will be in addition to existing efforts to address harms of current models, including misuse, job loss, disinformation, bias, and other problems.

Updated on July 7 with comments from Meta's Yann LeCun.

Stay updated. Subscribe to the AI Business newsletter

Read more about:

ChatGPT / Generative AI

About the Author(s)

Deborah Yao

Editor

Deborah Yao runs the day-to-day operations of AI Business. She is a Stanford grad who has worked at Amazon, Wharton School and Associated Press.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like