I have a dataset that contains the sequence of states for 1000 customers. the sequence is for 12 months (left to right, 1 state per month).
I have the transition matrix built based on the counts of actual transitions between states (P). The dataset has sequence data for 1000 customers (sequence of states for all 1000 customers for 12 months,T1 to T12, 1 state for each month). Now, for every customer I am trying to predict the next state for T13.
I have a total of 10 different states (numbered 1 to 10) where I represent the initial row vector of each state with a row of a 10x10 Identity matrix (so state_1 would be [1 0 0 0 0 0 0 0 0 0])
So, to find out the state at T13 for cutomer1, I find the state at T1 (let's say state 1) and I do: state_1*P 12 times. after this, I pick the max value among all the elements in the row vector and take the index (which is the state) and populate in a new matrix.
My question
1) How do I repeat this for all the 1000 customers and then collect the max values and the indices of those max values in a separate table. I also want to extend this process to T14, T15 and so on.
2) How to read the state value for a customer at T1 and then start the process of multiplication?
Thanks!
@Rick_SAS wrote several blogs about markov chain, search it .
I don't understand your logic. Or maybe I don't understand your assumptions. To predict the state at T13, you shouldn't start with T1, you merely need to use T12.
From your 1000 x 12 data points, you can build a matrix that gives the empirical probability of transitioning from State k to any other state. (This matrix is formed by aggregating over time, which assumes that the probabilities do not change over time.) So for each customer that you want to predict, you should look at the state that they are in right now (T12). If they are in State k, then the k_th row of the transition matrix is the probability vector for the next State. If you want to predict the next State, your best prediction is the State in row k that has the highest probability.
If you want to predict their state at T14, T15, etc, you can use vector-matrix multiplication to iterate the process. Equivalently, you can look at the rows of P**2, P**3, etc, where P**2 = P*P is the matrix product of the transition matrix with itself, P**3=P*P*P is cubic product, and so forth. At each stage, your best prediction for the next State is the column in the k_th row that has the highest probability.
To score all customers at once, represent their current state in a binary indicator matrix and form the matrix product of the indicator matrix with the transition matrix. Then choose the largest probability for each customer.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.