Reinforcement learning (RL) is used in a wide variety of fields. Examples include robotics, industrial, automation, dialogue creation, healthcare treatment recommendations, stock trading and computer games.
SAS Visual Data Mining and Machine Learning has provided batch reinforcement learning capabilities with Fitted Q-Networks (FQNs) for some time. The exciting news is that SAS now provides online “real-time” reinforcement learning with Deep Q-Networks (DQNs)!
Reinforcement learning is a machine learning model. Recall in machine learning we may have supervised learning, unsupervised learning, reinforcement learning, etc. Unlike supervised learning, in reinforcement learning there is no supervisor. Instead, there is a reward signal that serves as is the feedback mechanism.
The goal of reinforcement learning is to maximize a long-term reward accumulated over a sequence of actions. This occurs as an iterative process through trial and error. Time/order matters in RL. The data are sequential and are not identically and independently distributed.
As shown in the diagram below, there is an agent acting in an environment. With each action of the agent there is a positive or negative reward/punishment and the state changes after each action. Thus the agent is now presented with a new state and can choose a new action.
One example is a self driving car. the car exists in an environment that includes roads and so on. Actions the car may take could include moving forward, stopping, turning right, and so on.
The ultimate goal may be for the self-driving car to take me from my house to my favorite restaurant in the quickest way possible following all rules of the road and all safety precautions.
Examples of reinforcement learning algorithms include Markov Decision Processes, Q learning algorithms and SARSA (state-action-reward-state-action). Reinforcement learning methods may be on policy or off policy.
Q-learning is an example of an off-policy reinforcement learning method.
SAS VDMML let you use two different Q learning algorithms to accomplish reinforcement learning. Q learning seeks to learn a policy that maximizes total reward and it starts with a Q table. A Q table is a matrix of Q values for all possible states an all possible actions. Each cell of the table (each Q value) is initialized to zero. After each episode the Q values are updated and stored. Q stands for quality. High Q values indicate it is a good idea to take a particular action from a particular state. Low Q values indicate it is a bad idea to take a particular action from a particular state. The Q table becomes a reference table for the agent to select the best action based on the Q value.
Q learning is an iterative process that occurs in a series of steps. Below is an example of the steps for a Q learning process.
Q-values are updated when action at is taken from state st using an equation such as:
where:
Two Q learning methods are available in SAS VDMML.
The deep Q network method was new in SAS VDMML stable release 2020.1.3 , and is accomplished through the rlTrainDqn CAS action.
The new deep Q network algorithm is similar to the fitted Q network algorithm in a couple of ways:
But they are also quite different as follows.
The ability to use online real time reinforcement learning is a huge benefit of the new deep Q network! To follow an example and create a deep Q network of your own, see Susan Kahler’s article.
BE_reinfor
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.