Table of Contents


Last Updated: 2/6/2023

Reinforcement Learning

What is Reinforcement Learning?

Reinforcement Learning, or RL for short, is machine learning that allows agents to learn from their actions and optimize their decision-making in complex, dynamic environments. The process of operant conditioning inspires this learning in psychology, where rewards or penalties are used to modify an individual's behavior.

In reinforcement learning, an agent interacts with an environment by taking actions, receiving rewards, and observing the state of the environment. The goal of the agent is to maximize its total reward over time. The agent learns by updating its policy, which maps states to actions based on the rewards it receives. The policy is updated so that the agent selects actions that lead to higher rewards in the future.

Reinforcement learning is like a game where you make choices and get rewards or consequences based on those choices. It's like playing a video game where you get points for making good choices and lose points for making bad choices. The more you play the game, the better you get at making the right choices to get the most points.

Reinforcement-Learning

For example, let's say you're playing a candy collecting game. Every time you collect candy, you get one point. But, if you collect a gum, you lose two points. As you keep playing the game, you learn which choices to make to get the most points. This is similar to what happens in reinforcement learning. The computer program is the player and it makes decisions to get the most reward, just like you try to get the most points in the game.

How Reinforcement Learning Works? Reinforcement learning can be compared to the process of learning to drive a car. When you first start driving, you are unsure about when to accelerate, when to brake, and when to turn. Over time, you learn from your experiences and the feedback you receive from the road and your instructor. If you drive safely and follow the rules, you get praise and positive feedback, just like a reward in reinforcement learning. On the other hand, if you make mistakes, such as running a red light or speeding, you receive consequences, such as a ticket or reprimand, just like a penalty in reinforcement learning. As you keep driving, you gradually build up a set of rules or habits, called a policy, that dictates how you drive in different situations. For example, you may know to slow down when approaching a red light and speed up when the light turns green. This policy is constantly being updated as you learn from new experiences and receive feedback from the road. Just like a computer program in reinforcement learning, you are trying to maximize your reward (in this case, a safe and smooth driving experience) by making decisions that lead to the best outcomes. Over time, you get better at driving, just like a reinforcement learning agent gets better at making decisions in its environment.

Suppose you want to teach an agent how to park a vehicle. In this case, our environment is the parking lot, our agent is one vehicle, and our main goal is to park without hitting anything (of course!). Every time our car moves without hitting anything, we can give it a small reward, but if at any time it hits something, we must punish it. The agent receives the actual state of the environment. The actual state could be its position, the distance between the vehicle and the obstacles around it, and possible actions. These actions could be accelerating, stopping, reversing, or turning the wheel. We must let the vehicle try to park hundreds of times to fail and learn from its failures until it makes the right choices. For each iteration, they must check out the environment and choose one action, repeating this until the agent succeeds or fails. Of course, we must repeat this with hundreds of agents in hundreds or thousands of iterations. Some will learn how to park a car, and some will not. After all this training, we have an agent who has learned how to park completely alone without us telling or "programming" it to do it.

AI_Learns_to_Park

AI Learns to Park - Deep Reinforcement Learning

Reinforcement Learning in Real Life

Reinforcement learning has several real-life applications, including:

  1. Robotics: Reinforcement learning can train robots to perform tasks such as grasping objects, walking, and flying. The robot receives rewards for performing the task successfully and penalties for failing, allowing it to improve its performance continuously.
  2. Game playing: Reinforcement learning algorithms have been used to train agents to play games such as chess, go, and poker. The agent receives rewards for making good moves and loses points for making bad moves, allowing it to improve its strategy continuously.
  3. Finance: Reinforcement learning algorithms have been applied to trading and portfolio management, allowing investment decisions to be made based on market trends and historical data.
  4. Healthcare: Reinforcement learning can be used to optimize treatment plans for patients with chronic conditions, such as diabetes. The algorithm can learn from patient data and adjust the treatment plan to improve patient outcomes.
  5. Advertising: Reinforcement learning can optimize the delivery of online ads. The algorithm can learn from user behavior and adjust the frequency and timing of ad delivery to maximize the probability of user engagement. These are just a few examples of how reinforcement learning is being applied in real life. As technology continues to advance, the possibilities for reinforcement learning are only going to increase.

Types of Reinforcement Learning

Depending on how agents makes its decisions, they can be classified into one of the following types:

  1. Value-Based: In value-based reinforcement learning, the agent learns to predict the expected reward for each action in a given state. The agent then chooses the action that leads to the highest expected reward.
  2. Policy-Based: In policy-based reinforcement learning, the agent learns to map states directly to actions without computing a value function. The agent improves its policy through trial and error by taking actions and receiving rewards.
  3. Model-Based: In model-based reinforcement learning, the agent builds a model of the environment that predicts the next state and reward for a given action. The agent uses this model to plan its actions and improve its policy.
  4. Actor-Critic: Actor-critic is a combination of value-based and policy-based reinforcement learning. The actor learns the policy that maps states to actions, while the critic learns the value function that evaluates the expected reward for a given state and action. These are the main types of reinforcement learning, each with its own strengths and weaknesses. The choice of which type to use depends on the specific problem and the requirements of the task at hand.

Reward-Penalty

Extra Resources

Books

Chapter 1 (Introduction to deep reinforcement learning): In this book, you learn more about deep reinforcement learning, involved with creating computer programs that can achieve goals that require intelligence. In this specific chapter you will learn how it is different from other machine learning approaches. Also, you will learn about the recent progress in deep reinforcement learning and what it can do for a variety of problems.

Chapter 18 (Reinforcement Learning): It focuses on reinforcement learning and how to teach a machine to play.

Videos