Impact of reinforcement learning in higher education

When we talk about reinforcement learning, the two conditions that mainly come into play here are exploration and exploitation

21 Apr 2020 05:17 IST

New Update

Reinforcement learning is an area of machine learning inspired by behavioral psychology, where the machine learns by itself the behavior to follow based on rewards and penalties – hindsight experience replay. In this type of technique, you learn from the empirical data points. In every reinforcement learning problem, there are an agent, a state-defined environment, actions that the agent takes, and rewards or penalties that the agent gets on the way to achieve its objective.

Advertisment

In reinforcement learning, two conditions come into play: exploration and exploitation. Exploration refers to the choice of actions at random. Exploitation, on the other hand, refers to making decisions based on how valuable it is to perform an action from a given state.

In the classroom context, the above terms can be defined as follows:

State - how proficient the student is in the subject.
Environment - the learning objectives for a given unit of learning such as a topic in Algebra, History, or Physics.
Action - reviewing classroom content, such as a textbook, voice notes, etc.
State Evolution–any kind of assessment conducted.
Agent - the student that chooses which action to take.
Reward - final grade for the assessment.

The State

State can be measured with ‘Passed or Failed’ for the course. Considering the student was successful, we will move forward. If not, we might either repeat the content presented or provide some additional assistance until the student successfully passes the topic.

Advertisment

State Progress

Students absorb materials such as books, audio or video in order to achieve the desired target. Once the task has been completed, we will assess them based on the knowledge acquired in relation to the content provided. This is what we know as RL. Something to consider is that content may be relevant to multiple topics, or that a single topic may require multiple pieces of content to achieve the desired target. As the content availability grows, our instructional quality does too.

Reward Calculation

The agent selects the most adequate content provided by each state, while calculating the reward for each independent content assessment. The selection of content is established in relation to the best possible reward available. We need to keep in mind the future value of that content and what possible content is yet to be discovered by the agent. When the content has been successfully completed, we consider this a winning state. If the student only has one topic left to master, and if the student absorbs a specific piece of content, and that piece of content always leads to mastery of the level, we can easily select the appropriate level for the student. All content and students are not created equal as students learn differently. Therefore, assessments are always adjusted to the determined level of the particular student in place.

Information Required

For our agent, we need the following information:

The student's starting state.
The content presented.
The student's ending state.
The reward for this state transition.

Challenges

State encoding: the representation of a student’s level so RL can process the information.

Limitation of content available: for a course, assume that it has 200 pieces of content. This is quite a bit of data to store in a database and memory.

Limitation of state progress: most transitions would not be possible since content is focused on specific topics.

The article has been written by Dr. Raul V. Rodriguez,

He can be reached on LinkedIn.