Re: Building a Transition Matrix

AlexPanebianco · Posted 03-16-2023 06:16 PM

Hi all,

I have a dataset which I can reduce to three columns: Reference, Band_Start, Band_End.

An example of my data structure is as below:

Reference	Band_Start	Band_End
RefA	1 - 10	11 - 20
RefB	1 - 10	1 - 10
RefC	21 - 30	11 - 20
RefD	1 - 10	11 - 20
RefE	11 - 20	11 - 20
RefF	11 - 20	21 - 30
RefG	1 - 10	1 - 10

I would like to construct a transition matrix from this data and am unsure how to approach it in an efficient manner. The data I am working with contains approximately 600k rows and 10 distinct bands.

The matrix I am looking for will be of the form:

	1 - 10	11 - 20	21 - 30
1 - 10	x%	y%	z%
11 - 20
21 - 30

with the column on the left representing the start banding and the row above representing the end banding.

Any help on this is appreciated.

Thanks.

ballardw · Posted 03-16-2023 06:59 PM

What do your percent signs represent? Row percentage, Column percentage or table percentage?

I suspect that you may have done more work than is needed if you "reduced" data to that form.

One problem you may cause is that replacing, apparently numeric values, with strings like "1-10" is that those strings often won't sort nicely in the form that you want.

Please provide data in the form of working data steps so we have something to test code with.

data have;
  infile datalines dlm=',';
  informat Reference Band_Start Band_End $10.;
  input Reference Band_Start Band_End;
datalines;
RefA,1 - 10,11 - 20
RefB,1 - 10,1 - 10
RefC,21 - 30,11 - 20
RefD,1 - 10,11 - 20
RefE,11 - 20,11 - 20
RefF,11 - 20,21 - 30
RefG,1 - 10,1 - 10
;

proc tabulate data=have;
   class band_start band_end;
   table band_start,
         band_end*rowpctn
         /style=[pretext='Row percentage']
   ;
      table band_start,
         band_end*colpctn
         /style=[pretext='Column percentage']
   ;
      table band_start,
         band_end*pctn
         /style=[pretext='table percentage']
   ;
run;

Proc Tabulate is one way this can be done. Proc report probably, depending on which statistic you actually want.

The example shows why I asked which percent is meant. Pick the form that you want. The style= is just to place a short bit of text above each table to provide a description.

Note: if you have data from the start that Reference, Start numeric value, End numeric value then the above table could be made from that data set and a custom format or two. Which may be more flexible, depending on how your "reduced" the data if you want to see different ranges for either dimension.

Ksharp · Posted 03-17-2023 07:46 AM

@Rick_SAS wrote many blogs about this topic.
https://blogs.sas.com/content/iml/2016/07/07/markov-transition-matrices-sasiml.html
https://blogs.sas.com/content/iml/2023/02/13/pth-root-markov-transition.html
https://blogs.sas.com/content/iml/2016/07/13/absorbing-markov-chains-in-sas.html

Rick_SAS · Posted 03-17-2023 09:06 AM

To create an estimate of the Markov transition matrix from historical data, you need to create the matrix of counts for the transitions between each state. Then divide each row by the total count for the row in order to convert each row into a transition probability.

In the following, I simulate transitions between 10 categories for 1000 entities. I use PROC FREQ to tabulate the counts. I then read the counts into an IML matrix and divide each row by the row count. Notice that the output from PROC FREQ is in long form (100 observations), so you need to use the SHAPE function to convert the output to a matrix (wide form).

/* sample data */
data Have;
call streaminit(123);
do Reference = 1 to 1000;
   Band_Start = rand("Integer", 1, 10);
   Band_End   = rand("Integer", 1, 10);
   output;
end;
run;

proc freq data=Have noprint;
tables Band_Start * Band_End / out=freqOut sparse;
run;

proc iml;
use freqOut;
read all var {'Band_Start' 'Band_End' 'Count'} into M;
close;

/* verify that all pairs are in the data */
N = sqrt(nrow(M));        /* this should be an integer or else something is wrong */
names = unique(M[,1]);    /* there should be N names */
if N^= ncol(names) then print "Something is wrong! Not a square matrix.";

C = shape(M[,3], N, N);       /* raw counts */
print C;                     /* optional: print the raw counts */
M = C / C[,+];            /* divide each cell by total counts for the row */
print M[r=names c=names];

Rick_SAS · Posted 03-20-2023 06:26 AM

I wrote a blog post on this topic:

Estimate a Markov transition matrix from historical data

Building a Transition Matrix