topic Re: Using Look up Table based on Conditional Probabilities in SAS Programming

Using Look up Table based on Conditional Probabilities

michelconn — Tue, 16 Aug 2016 21:21:23 GMT

Hello All,

(SAS 9.3) I have generated a table of conditional probabilities and I am looking to select from that a specific variable and add it to a different table. Below are two example data sets for what I want to do. I am looking to add CVtypes onto the final dataset based on the probabilities from the probs dataset. An example from below if observation one has a CVtype of T and segment number of 1 then there is a 75% chance observation two is T and 25% chance observations two is C. If T is chosen seg would increase to 2. If C was chosen seg would then be 1.

Randono is included because I though that could be used to pick probabilities (ex if randno is <prob1 then ... else if rand number is >prob1 then...). I think SQL is my best option but I really don't know SQL well enought figure out how to implement it.

Thanks for any help

data probs; 
 infile datalines DSD;
 input segment cv1 $ cv2 $ terrain $ cvtype $ prob1 prob2;
 datalines;
1,C,T,roll,C,80,20
2,C,T,roll,C,70,30
3,C,T,roll,C,90,10
4,C,T,level,C,65,35
5,C,T,level,C,0,100
1,T,C,level,T,75,25
2,T,C,roll,T,60,40
3,T,C,roll,T,79,21
4,T,C,roll,T,12,88
5,T,C,roll,T,0,100
;
run;
data final; 
 infile datalines DSD;
 input randno terrain $ cvtype $ seg;
 datalines;
.654654,roll,T,1,
.12567,roll,,,
.967946,roll,,,
.36758,roll,,,
.6567654,roll,,,
.76314687,roll,,,
.65445,level,,,
.516874,level,,,
.045648,level,,,
.989957,level,,,
.254658,level,,,
;
run;

Re: Using Look up Table based on Conditional Probabilities

ballardw — Tue, 16 Aug 2016 19:59:49 GMT

Your process is not really clear. Where does the 75% and 25% chance come from for observation 2?

I think that you may be interested in using the RAND('TABLE') function since you apparently know the probabilities of interest.

Your fifth observation may have a problem as you have 40 and 45 leaving about 15% unaccounted for.

Note the following shows use of Rand('table') with your probability values and shows a result that represents the first, second (or in the possible case of the fifth observation) or third possibility. 1 means that something with the first probability was selected, 2 the second. 3 means more than the 2nd. The sums of the elements should total exactly 1.

data probs; 
   infile datalines DSD;
   input segment cv1 $ cv2 $ terrain $ cvtype $ prob1 prob2;
   /*get decimal values*/
   prob1= prob1 *0.01;
   prob2= prob2 *0.01;
   result = rand('table',prob1,prob2);
datalines;
1,C,T,roll,C,80,20
2,C,T,roll,C,70,30
3,C,T,roll,C,90,10
4,C,T,level,C,65,35
5,C,T,level,C,40,45
1,T,C,level,T,75,25
2,T,C,roll,T,60,40
3,T,C,roll,T,79,21
4,T,C,roll,T,12,88
5,T,C,roll,T,36,64
;

Re: Using Look up Table based on Conditional Probabilities

michelconn — Tue, 16 Aug 2016 21:46:22 GMT

Thank you for the help sorry about the 45 40 it was a typo. The RAND('TABLE') option isn't quite what I am looking for I want to reference the probs table a large number of times (in the real data 160,000 times). Both data sets are made up examples.

Let me explain my variables a little better. CVtype is what the observation currently is. So in my final table observation one is currently T. Cv1, in the prob table, is what the next observation could be with a probability of prob1. Cv2, in the prob table, is also what the next observation could be with a probability of prob2. The probs were all calculated earlier by me.

If we look at the final table the observation is currently T, with a segment of 1, and a terrain of roll. I want to then reference my probs table. In my probs table where cvtype is T, segment is 1, and terrain is roll I want to then reference prob1 and prob2 and pick either cv1 or cv2 and add it as my next observation. If T (cv1) is chosen then segment increases to 2, if C (cv2) is chosen segment goes to 1.

If T was chosen as observation 2. I would then look at my probs table were cvtype is T, segment is 2, and terrain is roll and then pick either cv1 or cv2 as observation 3. After three iterations the final data might look like example 1. I want final data to eventually look look something like example 2.

Hopefully this is a little more clear. The probs table is only a reference and I want to, one at a time, add new cvtypes to the final table.

Example 1
0.654654	roll	T	1
0.12567	roll	T	2
0.967946	roll	C	1
0.36758	roll
0.656765	roll
0.763147	roll
0.65445	level
0.516874	level
0.045648	level
0.989957	level
0.254658	level

Example 2
0.654654	roll	T	1
0.12567	roll	T	2
0.967946	roll	C	1
0.36758	roll	C	2
0.656765	roll	C	3
0.763147	roll	C	4
0.65445	level	T	1
0.516874	level	T	2
0.045648	level	T	3
0.989957	level	T	4
0.254658	level	T	5

Re: Using Look up Table based on Conditional Probabilities

ballardw — Tue, 16 Aug 2016 22:44:13 GMT

I am not really seeing how your look up table is involved. Look up implies that a value already in the data exists to match on.

What does you actual "input" data look like? It looks like you showed what you are thinking of as a look up table but I can't tell whether the actual input table consists of only a random number or something else.

It looks like you are somehow comparing a value of 0.654654 to 80??? or 20???

I suspect that your Prob2 is completely unneeded as Prob2 = 100-Prob1. Or to match you "randno" better would be .8 and then prob2 = 1 -0.8

If you don't have a CV1, CV2 or Seg for the first record how do you decide which record in the look up data to compare with?

Are there any other values than those shown in your PROBS dataset?

And I still don't see how you get " if observation one has a CVtype of T and segment number of 1 then there is a 75% chance observation two is T and 25% chance observations two is C." The second row of probs has 70 and 30. Are you now averaging things between rows??

Re: Using Look up Table based on Conditional Probabilities

Ksharp — Wed, 17 Aug 2016 03:09:29 GMT

So what is the key to query table PROBS, segment and cvtype ?




data probs; 
 infile datalines DSD;
 input segment cv1 $ cv2 $ terrain $ cvtype $ prob1 prob2;
 datalines;
1,C,T,roll,C,80,20
2,C,T,roll,C,70,30
3,C,T,roll,C,90,10
4,C,T,level,C,65,35
5,C,T,level,C,0,100
1,T,C,level,T,75,25
2,T,C,roll,T,60,40
3,T,C,roll,T,79,21
4,T,C,roll,T,12,88
5,T,C,roll,T,0,100
;
run;
data final; 
 infile datalines DSD;
 input randno terrain $ cvtype $ segment;
 datalines;
.654654,roll,T,1,
.12567,roll,,,
.967946,roll,,,
.36758,roll,,,
.6567654,roll,,,
.76314687,roll,,,
.65445,level,,,
.516874,level,,,
.045648,level,,,
.989957,level,,,
.254658,level,,,
;
run;
data want;
if _n_=1 then do;
 if 0 then set probs;
 declare hash h(dataset:'probs');
 h.definekey('segment','cvtype');
 h.definedata('cv1','cv2','prob1','prob2');
 h.definedone();
end;
 set final;
array x{*} $ cv1 cv2;
length lag_cvtype $ 8;
retain lag_seg lag_cvtype;
 
 if h.find(key:lag_seg,key:lag_cvtype)=0 then do;
  cvtype=x{rand('table',prob1/100,prob2/100)};
  if cvtype=lag_cvtype then segment=lag_seg+1;
   else segment=1;
 end;
lag_seg=segment;lag_cvtype=cvtype; 
drop lag_seg lag_cvtype cv1 cv2 prob1 prob2;
run;