Facebook’s director of AI research, Yann LeCun, sees a future where machines could learn common sense from video and observing its surroundings.

Ever since researchers managed to find a way to get machines to interpret images we’ve seen a huge leap in the accuracy of AI. This technology, called artificial neural networks, is why we now have things such as facial recognition and even Facebook’s ability to spot your friends in your photos so that you can tag them more easily.

However, Facebook’s Head of AI and New York University professor, Yann LeCunn, thinks that the tech still has a long way to go and see a future where machines can actually learn common sense through video as opposed to language and still images. In an interview with MIT Technology Review, LeCunn expanded upon this idea.

“There have been, on the face of it, impressive demonstrations, [but] those are not as impressive as they look. Their domain of expertise is very limited to whatever universe we train them on. Most of the systems, you show them images with other types of objects or unusual situations they’ve never seen and they will say complete garbage about it. They don’t have common sense,” he said.

He continued, “As long as you have enough data, on the order of 1,000 objects per category, we can recognize very specific objects like cars of a particular brand or plants of a particular species or dogs of a particular breed. We can also recognize more abstract categories, like whether images are landscapes, sunsets, weddings, or birthday parties. Just five years ago it wasn’t clear this problem was completely solvable. But that doesn’t mean vision is solved.”

MIT Technology Review’s Tom Simonite then asked LeCun to elaborate on the connection between vision and common sense. “If you tell a machine “This is a smartphone,” “This is a steamroller,” “There are certain things you can move by pushing and others you cannot,” perhaps the machine will learn basic knowledge about how the world works. Kind of like how babies learn,” he explained.

“One of the things we really want to do is get machines to acquire the very large number of facts that represent the constraints of the real world just by observing it through video or other channels. That’s what would allow them to acquire common sense, in the end.”

LeCun continued, “These are things that animals and babies learn in the first few months of life—you learn a ridiculously large amount about the world just by observation. There are a lot of ways that machines are currently fooled easily because they have very narrow knowledge of the world.”

LeCun then spoke about how Facebook is very interested in getting their AI to predict the future, and getting software to observe through video is a big part of this. “We are very interested in the idea that a learning system should be able to predict the future. You show it a few frames of video and it tries to predict what’s going to happen next. If we can train a system to do this we think we’ll have developed techniques at the root of an unsupervised learning system,” he said.

“That is where, in my opinion, a lot of interesting things are likely to happen. The applications for this are not necessarily in vision—it’s a big part of our effort in making progress in AI,” he finished.