AI Business is part of the Informa Tech Division of Informa PLC
This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.
by Jos Martin, MathWorks
You might be familiar with reinforcement learning as the AI responsible for beating human players in board games like Go and chess. But for businesses, reinforcement learning has the potential to do so much more.
Many engineers, scientists, and researchers want to take advantage of this new and growing technology, but simply don’t know where to start.
If we simplify the concept, at its foundation, reinforcement learning is a type of machine learning that has the potential to solve tough decision-making problems. But to truly understand how it impacts us, we need to answer three key questions:
Reinforcement learning is when a computer learns to perform a task through repeated trial-and-error interactions with a dynamic environment. In this learning approach, the computer makes a succession of decisions that maximize a reward metric for the task without human intervention and without being explicitly programmed to achieve it.
Consider the task of parking a vehicle using an automated driving system. The goal is for the vehicle computer (agent) to park the vehicle in the correct parking spot with the right orientation. The environment is everything outside the agent, including the dynamics of the vehicle, other nearby vehicles, weather conditions, and so on. During training, the agent uses readings from sensors such as cameras, GPS, and lidar (observations) to generate steering, braking, and acceleration commands (actions). To learn how to generate the correct actions from the observations (policy tuning), the agent repeatedly tries to park the vehicle using a trial-and-error process. A reward signal can be provided to evaluate the effectiveness of a trial and to guide the learning process.
Despite its benefits, reinforcement learning may not always be the right approach.
For example, reinforcement learning is not sample efficient, meaning a lot of training is essential to achieve acceptable performance. As an example, AlphaGo was trained around the clock over a few days by playing millions of games, amassing thousands of years of human knowledge. Even for relatively simple applications, training time can anywhere from minutes to hours or days. Finally, setting up the problem correctly can be a headache; numerous design decisions need to be made, which may require a few rounds of changes to get right. These decisions include choosing a suitable architecture for the neural networks, tuning hyperparameters, and shaping the reward signal.
What’s more, a trained deep neural network policy is often treated as a “black box,” meaning that the internal structure of the neural network is so complicated, often consisting of millions of parameters, that to understand it is a near-impossible task. Having to explain and evaluate the decisions made by the network is another challenge. This makes the creation of formal performance guarantees with neural network policies very difficult.
Therefore,if an engineer is working on a time- or safety-critical project, this form of machine learning may not be the best thing to try first.
What does the reinforcement learning workflow look like?
recommended steps in the reinforcement workflow are as follows:
Training an agent using reinforcement learning is an iterative process. Decisions and results in later stages can necessitate that you go back to an earlier stage in the learning workflow.
Irrespective of the final choice of tool, before you decide to implement reinforcement learning, it is crucial to ask whether, given the time and resources you have for this project, reinforcement learning is the best approach.
Jos Martin is senior engineering manager at MathWorks, makers of MATLAB and Simulink software