Novel approach needs only 'marginal amounts' of real-world data to train robots to do new tasks.

Ben Wodecki, Jr. Editor

February 17, 2023

2 Min Read

At a Glance

  • GenAug generates variations of images from a simple scene to train robots in new tasks.
  • GenAug can be used to generalize actions at scale using little input data.
  • GenAug struggles with visual consistency for frame augmentation in a video.

Meta AI researchers have unveiled a new method that uses text-to-image models to teach robots how to do new tasks, opening the door for robotic systems to have broader and more general capabilities.

Dubbed GenAug, or Generative Augmentation for Real-World Data Collection, the method creates images of new scenarios by simply typing in text. Robots will train on these new scenarios to learn capabilities that can be generalized across tasks.

Without using generative AI, training robots for new tasks would require “large diverse datasets that are expensive to collect in real-world robotics settings,” the researchers wrote in a blog.

The new method can create entirely different and yet realistic images of environments to improve the robot’s ability to manipulate objects − with much less data. “We apply GenAug to tabletop manipulation tasks, showing the ability to re-target behavior to novel scenarios, while only requiring marginal amounts of real world data,” the researchers said.

According to Meta’s researchers, the GenAug method showed a 40% improvement in generalization to novel scenes and objects than traditional training methods for robotic object manipulation.

View post on X

How it works

Robots need to be taught how to behave but also how to behave optimally. To achieve optimal behavior in settings like warehouses and factories, robotics engineers will apply techniques such as motion planning or trajectory optimization.

According to the team behind GenAug, however, such techniques “fail to generalize to novel scenarios without significant environment modeling and replanning.”

Instead, the researchers contend that using techniques such as imitation learning and reinforcement learning could generalize behaviors to teach robotic systems without having to model specific environments.

GenAug can use language prompts to change an object’s texture or shape or add new backgrounds to create examples to teach robots. Meta’s research paper shows that GenAug can be used to create 10 real-world complex demonstrations from a single, simple environment.

“We show that despite these generative models being trained on largely non-robotics data, they can serve as effective ways to impart priors into the process of robot learning in a way that enables widespread generalization,” Meta’s research reads.


GenAug does have some limitations. For example, it does not augment action labels, meaning the system assumes the same action still works on the augmented scenes.

GenAug also cannot guarantee visual consistency for frame augmentation in a video. It takes about 30 seconds to complete all the augmentations for one scene, which might not be practical for some robot-learning approaches.

For future research into using generative AI to train robots, the Meta team said GenAug could see whether a combination of language models and vision-language models yields improve scene generations.

The code for GenAug is not yet available but is “coming soon,” according to Meta’s team.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like