BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michelconn
Quartz | Level 8

Hello All,

 

(SAS 9.3) I have generated a table of conditional probabilities and I am looking to select from that a specific variable and add it to a different table. Below are two example data sets for what I want to do. I am looking to add CVtypes onto the final dataset based on the probabilities from the probs dataset. An example from below if observation one has a CVtype of T and segment number of 1 then there is a 75% chance observation two is T and 25% chance observations two is C. If T is chosen seg would increase to 2. If C was chosen seg would then be 1. 

 

Randono is included because I though that could be used to pick probabilities (ex if randno is <prob1 then ... else if rand number is >prob1 then...). I think SQL is my best option but I really don't know SQL well enought figure out how to implement it. 

 

Thanks for any help

 

 

data probs; 
infile datalines DSD;
input segment cv1 $ cv2 $ terrain $ cvtype $ prob1 prob2;
datalines;
1,C,T,roll,C,80,20
2,C,T,roll,C,70,30
3,C,T,roll,C,90,10
4,C,T,level,C,65,35
5,C,T,level,C,0,100
1,T,C,level,T,75,25
2,T,C,roll,T,60,40
3,T,C,roll,T,79,21
4,T,C,roll,T,12,88
5,T,C,roll,T,0,100
;
run;
data final;
infile datalines DSD;
input randno terrain $ cvtype $ seg;
datalines;
.654654,roll,T,1,
.12567,roll,,,
.967946,roll,,,
.36758,roll,,,
.6567654,roll,,,
.76314687,roll,,,
.65445,level,,,
.516874,level,,,
.045648,level,,,
.989957,level,,,
.254658,level,,,
;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
So what is the key to query table PROBS, segment and cvtype ?




data probs; 
 infile datalines DSD;
 input segment cv1 $ cv2 $ terrain $ cvtype $ prob1 prob2;
 datalines;
1,C,T,roll,C,80,20
2,C,T,roll,C,70,30
3,C,T,roll,C,90,10
4,C,T,level,C,65,35
5,C,T,level,C,0,100
1,T,C,level,T,75,25
2,T,C,roll,T,60,40
3,T,C,roll,T,79,21
4,T,C,roll,T,12,88
5,T,C,roll,T,0,100
;
run;
data final; 
 infile datalines DSD;
 input randno terrain $ cvtype $ segment;
 datalines;
.654654,roll,T,1,
.12567,roll,,,
.967946,roll,,,
.36758,roll,,,
.6567654,roll,,,
.76314687,roll,,,
.65445,level,,,
.516874,level,,,
.045648,level,,,
.989957,level,,,
.254658,level,,,
;
run;
data want;
if _n_=1 then do;
 if 0 then set probs;
 declare hash h(dataset:'probs');
 h.definekey('segment','cvtype');
 h.definedata('cv1','cv2','prob1','prob2');
 h.definedone();
end;
 set final;
array x{*} $ cv1 cv2;
length lag_cvtype $ 8;
retain lag_seg lag_cvtype;
 
 if h.find(key:lag_seg,key:lag_cvtype)=0 then do;
  cvtype=x{rand('table',prob1/100,prob2/100)};
  if cvtype=lag_cvtype then segment=lag_seg+1;
   else segment=1;
 end;
lag_seg=segment;lag_cvtype=cvtype; 
drop lag_seg lag_cvtype cv1 cv2 prob1 prob2;
run;

View solution in original post

4 REPLIES 4
ballardw
Super User

Your process is not really clear. Where does the 75% and 25% chance come from for observation 2?

 

I think that you may be interested in using the RAND('TABLE') function since you apparently know the probabilities of interest.

Your fifth observation may have a problem as you have 40 and 45 leaving about 15% unaccounted for.

Note the following shows use of Rand('table') with your probability values and shows a result that represents the first, second (or in the possible case of the fifth observation) or third possibility. 1 means that something with the first probability was selected, 2 the second. 3 means more than the 2nd. The sums of the elements should total exactly 1.

data probs; 
   infile datalines DSD;
   input segment cv1 $ cv2 $ terrain $ cvtype $ prob1 prob2;
   /*get decimal values*/
   prob1= prob1 *0.01;
   prob2= prob2 *0.01;
   result = rand('table',prob1,prob2);
datalines;
1,C,T,roll,C,80,20
2,C,T,roll,C,70,30
3,C,T,roll,C,90,10
4,C,T,level,C,65,35
5,C,T,level,C,40,45
1,T,C,level,T,75,25
2,T,C,roll,T,60,40
3,T,C,roll,T,79,21
4,T,C,roll,T,12,88
5,T,C,roll,T,36,64
;
michelconn
Quartz | Level 8

Thank you for the help sorry about the 45 40 it was a typo. The RAND('TABLE') option isn't quite what I am looking for I want to reference the probs table a large number of times (in the real data 160,000 times). Both data sets are made up examples. 

 

Let me explain my variables a little better. CVtype is what the observation currently is. So in my final table observation one is currently T. Cv1, in the prob table, is what the next observation could be with a probability of prob1. Cv2, in the prob table, is also what the next observation could be with a probability of prob2. The probs were all calculated earlier by me. 

 

If we look at the final table the observation is currently T, with a segment of 1, and a terrain of roll. I want to then reference my probs table. In my probs table where cvtype is T, segment is 1, and terrain is roll I want to then reference prob1 and prob2 and pick either cv1 or cv2 and add it as my next observation. If T (cv1) is chosen then segment increases to 2, if C (cv2) is chosen segment goes to 1. 

 

If T was chosen as observation 2. I would then look at my probs table were cvtype is T, segment is 2, and terrain is roll and then pick either cv1 or cv2 as observation 3. After three iterations the final data might look like example 1. I want final data to eventually look look something like example 2.

 

Hopefully this is a little more clear. The probs table is only a reference and I want to, one at a time, add new cvtypes to the final table.

 

Example 1    
0.654654 roll T 1
0.12567 roll T 2
0.967946 roll C 1
0.36758 roll    
0.656765 roll    
0.763147 roll    
0.65445 level    
0.516874 level    
0.045648 level    
0.989957 level    
0.254658 level    

 

Example 2    
0.654654 roll T 1
0.12567 roll T 2
0.967946 roll C 1
0.36758 roll C 2
0.656765 roll C 3
0.763147 roll C 4
0.65445 level T 1
0.516874 level T 2
0.045648 level T 3
0.989957 level T 4
0.254658 level T 5



 

ballardw
Super User

I am not really seeing how your look up table is involved. Look up implies that a value already in the data exists to match on.

 

What does you actual "input" data look like? It looks like you showed what you are thinking of as a look up table but I can't tell whether the actual input table consists of only a random number or something else.

 

It looks like you are somehow comparing a value of 0.654654 to 80??? or 20???

I suspect that your Prob2 is completely unneeded as Prob2 = 100-Prob1. Or to match you "randno" better would be .8 and then prob2 = 1 -0.8

 

If you don't have a CV1, CV2 or Seg for the first record how do you decide which record in the look up data to compare with?

 

Are there any other values than those shown in your PROBS dataset?

 

And I still don't see how you get " if observation one has a CVtype of T and segment number of 1 then there is a 75% chance observation two is T and 25% chance observations two is C." The second row of probs has 70 and 30. Are you now averaging things between rows??

Ksharp
Super User
So what is the key to query table PROBS, segment and cvtype ?




data probs; 
 infile datalines DSD;
 input segment cv1 $ cv2 $ terrain $ cvtype $ prob1 prob2;
 datalines;
1,C,T,roll,C,80,20
2,C,T,roll,C,70,30
3,C,T,roll,C,90,10
4,C,T,level,C,65,35
5,C,T,level,C,0,100
1,T,C,level,T,75,25
2,T,C,roll,T,60,40
3,T,C,roll,T,79,21
4,T,C,roll,T,12,88
5,T,C,roll,T,0,100
;
run;
data final; 
 infile datalines DSD;
 input randno terrain $ cvtype $ segment;
 datalines;
.654654,roll,T,1,
.12567,roll,,,
.967946,roll,,,
.36758,roll,,,
.6567654,roll,,,
.76314687,roll,,,
.65445,level,,,
.516874,level,,,
.045648,level,,,
.989957,level,,,
.254658,level,,,
;
run;
data want;
if _n_=1 then do;
 if 0 then set probs;
 declare hash h(dataset:'probs');
 h.definekey('segment','cvtype');
 h.definedata('cv1','cv2','prob1','prob2');
 h.definedone();
end;
 set final;
array x{*} $ cv1 cv2;
length lag_cvtype $ 8;
retain lag_seg lag_cvtype;
 
 if h.find(key:lag_seg,key:lag_cvtype)=0 then do;
  cvtype=x{rand('table',prob1/100,prob2/100)};
  if cvtype=lag_cvtype then segment=lag_seg+1;
   else segment=1;
 end;
lag_seg=segment;lag_cvtype=cvtype; 
drop lag_seg lag_cvtype cv1 cv2 prob1 prob2;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1576 views
  • 1 like
  • 3 in conversation