I have a dataset that contains the sequence of states for 1000 customers. the sequence is for 12 months (left to right, 1 state per month).
I have the transition matrix built based on the counts of actual transitions between states (P). The dataset has sequence data for 1000 customers (sequence of states for all 1000 customers for 12 months,T1 to T12, 1 state for each month). Now, for every customer I am trying to predict the next state for T13. I have a total of 10 different states (numbered 1 to 10) where I represent the initial row vector of each state with a row of a 10x10 Identity matrix (so state_1 would be [1 0 0 0 0 0 0 0 0 0]) So, to find out the state at T13 for cutomer1, I find the state at T1 (let's say state 1) and I do: state_1*P 12 times. after this, I pick the max value among all the elements in the row vector and take the index (which is the state) and populate in a new matrix. My question
1) How do I repeat this for all the 1000 customers and then collect the max values and the indices of those max values in a separate table. I also want to extend this process to T14, T15 and so on.
2) How to read the state value for a customer at T1 and then start the process of multiplication?
I don't understand your logic. Or maybe I don't understand your assumptions. To predict the state at T13, you shouldn't start with T1, you merely need to use T12.
From your 1000 x 12 data points, you can build a matrix that gives the empirical probability of transitioning from State k to any other state. (This matrix is formed by aggregating over time, which assumes that the probabilities do not change over time.) So for each customer that you want to predict, you should look at the state that they are in right now (T12). If they are in State k, then the k_th row of the transition matrix is the probability vector for the next State. If you want to predict the next State, your best prediction is the State in row k that has the highest probability.
If you want to predict their state at T14, T15, etc, you can use vector-matrix multiplication to iterate the process. Equivalently, you can look at the rows of P**2, P**3, etc, where P**2 = P*P is the matrix product of the transition matrix with itself, P**3=P*P*P is cubic product, and so forth. At each stage, your best prediction for the next State is the column in the k_th row that has the highest probability.
To score all customers at once, represent their current state in a binary indicator matrix and form the matrix product of the indicator matrix with the transition matrix. Then choose the largest probability for each customer.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.