reinforcement learning

How reinforcement learning enables computers to learn on their own

The landmark victory of Google's AlphaGo over Lee Sedol in a Go match has only strengthened the belief that reinforcement learning is the way forward. However, given the challenges in its deployment the adoption of reinforcement learning is still limited

Reinforcement learning is a subset of machine learning where instead of training a computer to do as directed, it is made to learn from its own reactions to the situations it is made to go through. The outcomes of its actions, positive or negative, teach the computer to respond to a given situation.

The current form of reinforcement learning, complete with the rewards and punishments for a computer’s trial and error learning, can be attributed to A Harry Klopf. Later, Richard S Sutton and Andrew G Barto worked on differentiating between supervised and reinforcement learning.

Reinforcement learning is different from supervised and unsupervised learning

Reinforcement learning differs from supervised learning, as the latter involves training computers to a pre-defined outcome, whereas in reinforcement learning there is no pre-defined outcome and the computer must find its own best method to respond to a specific situation. Much like the real-life, in reinforced learning, there are multiple possible outputs for a particular problem. The solution that earns the maximum reward is considered the best solution.

One may get confused between reinforced learning and unsupervised learning. However, unlike unsupervised learning where the aim is to find similarities or differences between data points, reinforcement learning focuses on finding a suitable action model that would maximize the overall reward. Since there are no supervisors to monitor the training, the computer must make its decisions (or choices) in a sequential manner and the reward is in the form of a number or a signal. Depending on this signal (reward or punishment), the machine gets the next set of data. This means that the learning and feedback takes place over a period of time.

This also eliminates the need for large data sets, usually required, to train computers in machine learning algorithms and thus allows building applications that use general-use deep learning algorithms. There are many areas that reinforcement learning is being used for. These include gaming, robotics, simulation-based optimization, data processing, operations research, genetic algorithms, as well as to create custom training systems for students.

Types of reinforcement learning methods

Reinforcement learning is based on two types of learning methods:

Positive Reinforcement: It refers to the positive action that accrues from a certain behavior of the computer. The computer learns that since this particular behavior yielded a positive outcome, it increases the frequency of that behavior and enhances the performance to sustain the change for a longer duration.

Negative Reinforcement: It refers to the change in behavior of a computer when it acts in order to avoid a negative outcome and define the minimum standard for the performance.

Adoption is still limited

Although reinforcement learning has successfully generated a buzz, its adoption is still limited. This is largely because, deployment of reinforcement learning is currently difficult and the use cases are limited. That said, there is a lot of research underway and it is possible that with use cases becoming increasingly successful, the adoption will also increase.

The article has been written by Neetu Katyal, Content and Marketing Consultant

She can be reached on LinkedIn.


Leave a Reply

Your email address will not be published. Required fields are marked *