/dq/media/post_banners/wp-content/uploads/2020/04/binary-1790842_640.jpg)
ML algorithms allow us to model and predict Big Data behaviors based on historical data: looking back. But what if the historical database is not enough to model our problem? This is where the so-called reinforcement learning comes into account, and machines explore their environment while learning from scratch based on rewards and penalties. They are simply keeping a future vision towards a goal.
Supervised and unsupervised learning techniques are being applied to meet users, predict subscriber likes and behaviors, predict system failures, among other functions. In order to carry out such tasks, this type of algorithm usually requires a large amount of historical data, which records the different characteristics and possible configurations within a given context. Take as an example of recommending products to visitors to a particular web page. Let's consider that for each visitor to the page, we want to highlight a certain selection of products, which we hope will be to their liking. The idea is to attract visitors by showing them the ‘appealing’ products that are or could be of interest to them. One of the ways to tackle this situation is to carry out a classification of the types of customers that visit and make purchases through this particular site. Once the different categories of clients have been distributed, we can visualize which category each of them falls under, consequently choosing products of interest that match that classification.
Although such procedures work, it is necessary to have a considerable and consistent historical database, where the preferences of all customers who have made purchases through the page have been recorded. The issue that companies experience is that such an amount of information is not always freely available, or it is not enough for algorithms to correctly model customer tastes. If we do not have an informative database -normally biased- the algorithms that require historical data can considerably fail to perform.
The question lies in asking if, wouldn't it be better to use an algorithm that “learns to learn”? A type of algorithm that learns to know your customers; learns to know how the system works, and how to reach the goal on its own. Basically, an algorithm that can learn from scratch and from experience.
Such a type of machine learning algorithm exists and is what is known as reinforcement learning. Reinforcement learning is an area of machine learning inspired by behavioral psychology, where the machine learns by itself the behavior to follow based on rewards and penalties – hindsight experience replay. In this type of technique, you learn from the empirical data points. Similar to how dogs learn to do stunts based on treats, or a child becomes adept at a particular video game, reinforcement learning works on a trial-and-error basis, receiving rewards or penalties for each step taken to achieve a certain goal.
In every reinforcement learning problem, there is an agent, a state-defined environment, actions that the agent takes, and rewards or penalties that the agent gets on the way to achieve its objective. Let us have a look at the following diagram:
At a certain point in time, the agent is in a state
How does the agent choose each action? In reinforced learning, two things come into play: exploration and exploitation. Exploration refers to the choice of actions at random. Exploitation, on the other hand, refers to making decisions based on how valuable it is to perform an action from a given state. Depending on how we want to learn to develop further, the levels of exploration and exploitation be modified. For example, we can establish that the agent chooses actions 30% of the time in a random way so that he can explore the environment by himself, and that 70% of the remaining time choose the most valuable actions for each state in which he is. But why not always choose to exploit? Remember that the agent begins to learn from scratch. So, in the beginning, all the actions in the initial state have a null value. Furthermore, the number of shares available for each state may vary from state to state, so the overall environment is not known in advance. It is only through experience that stocks begin to acquire value. Consequently, exploration is vital.
How can we apply this to the example of the recommendation system mentioned above? The agent would become the machinery that will learn what product to recommend for each visitor. The actions would be the different products that the page offers. The clients and the characteristics of each visit define the environment and the states. If the user clicks on the recommended product, the agent will receive a reward of 1. If the user does not click on the recommended product, the agent will receive a penalty of 0. This way, the agent will learn what product to recommend from a given state.
As we have seen, reinforcement learning has the potential to personalize solutions, offering recommendations tailored to each client, without the need for prior knowledge of the users. Could this be the true future of marketing?
The article has been written by Dr. Raul V. Rodriguez,
He can be reached on LinkedIn.