turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Using Look up Table based on Conditional Probabili...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-16-2016 03:39 PM - edited 08-16-2016 05:21 PM

Hello All,

(SAS 9.3) I have generated a table of conditional probabilities and I am looking to select from that a specific variable and add it to a different table. Below are two example data sets for what I want to do. I am looking to add CVtypes onto the **final** dataset based on the probabilities from the **probs** dataset. An example from below if observation one has a CVtype of T and segment number of 1 then there is a 75% chance observation two is T and 25% chance observations two is C. If T is chosen seg would increase to 2. If C was chosen seg would then be 1.

Randono is included because I though that could be used to pick probabilities (ex if randno is <prob1 then ... else if rand number is >prob1 then...). I think SQL is my best option but I really don't know SQL well enought figure out how to implement it.

Thanks for any help

data probs;

infile datalines DSD;

input segment cv1 $ cv2 $ terrain $ cvtype $ prob1 prob2;

datalines;

1,C,T,roll,C,80,20

2,C,T,roll,C,70,30

3,C,T,roll,C,90,10

4,C,T,level,C,65,35

5,C,T,level,C,0,100

1,T,C,level,T,75,25

2,T,C,roll,T,60,40

3,T,C,roll,T,79,21

4,T,C,roll,T,12,88

5,T,C,roll,T,0,100

;

run;

data final;

infile datalines DSD;

input randno terrain $ cvtype $ seg;

datalines;

.654654,roll,T,1,

.12567,roll,,,

.967946,roll,,,

.36758,roll,,,

.6567654,roll,,,

.76314687,roll,,,

.65445,level,,,

.516874,level,,,

.045648,level,,,

.989957,level,,,

.254658,level,,,

;

run;

Accepted Solutions

Solution

08-18-2016
10:57 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to michelconn

08-16-2016 11:09 PM

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to michelconn

08-16-2016 03:59 PM

Your process is not really clear. Where does the 75% and 25% chance come from for observation 2?

I think that you may be interested in using the RAND('TABLE') function since you apparently know the probabilities of interest.

Your fifth observation may have a problem as you have 40 and 45 leaving about 15% unaccounted for.

Note the following shows use of Rand('table') with your probability values and shows a result that represents the first, second (or in the possible case of the fifth observation) or third possibility. 1 means that something with the first probability was selected, 2 the second. 3 means more than the 2nd. The sums of the elements should total exactly 1.

```
data probs;
infile datalines DSD;
input segment cv1 $ cv2 $ terrain $ cvtype $ prob1 prob2;
/*get decimal values*/
prob1= prob1 *0.01;
prob2= prob2 *0.01;
result = rand('table',prob1,prob2);
datalines;
1,C,T,roll,C,80,20
2,C,T,roll,C,70,30
3,C,T,roll,C,90,10
4,C,T,level,C,65,35
5,C,T,level,C,40,45
1,T,C,level,T,75,25
2,T,C,roll,T,60,40
3,T,C,roll,T,79,21
4,T,C,roll,T,12,88
5,T,C,roll,T,36,64
;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

08-16-2016 05:40 PM - edited 08-16-2016 05:46 PM

Thank you for the help sorry about the 45 40 it was a typo. The RAND('TABLE') option isn't quite what I am looking for I want to reference the probs table a large number of times (in the real data 160,000 times). Both data sets are made up examples.

Let me explain my variables a little better. CVtype is what the observation currently is. So in my final table observation one is currently T. Cv1, in the prob table, is what the next observation could be with a probability of prob1. Cv2, in the prob table, is also what the next observation could be with a probability of prob2. The probs were all calculated earlier by me.

If we look at the final table the observation is currently T, with a segment of 1, and a terrain of roll. I want to then reference my probs table. In my probs table where cvtype is T, segment is 1, and terrain is roll I want to then reference prob1 and prob2 and pick either cv1 or cv2 and add it as my next observation. If T (cv1) is chosen then segment increases to 2, if C (cv2) is chosen segment goes to 1.

If T was chosen as observation 2. I would then look at my probs table were cvtype is T, segment is 2, and terrain is roll and then pick either cv1 or cv2 as observation 3. After three iterations the final data might look like example 1. I want final data to eventually look look something like example 2.

Hopefully this is a little more clear. The probs table is only a reference and I want to, one at a time, add new cvtypes to the final table.

Example 1 | |||

0.654654 | roll | T | 1 |

0.12567 | roll | T | 2 |

0.967946 | roll | C | 1 |

0.36758 | roll | ||

0.656765 | roll | ||

0.763147 | roll | ||

0.65445 | level | ||

0.516874 | level | ||

0.045648 | level | ||

0.989957 | level | ||

0.254658 | level |

Example 2 | |||

0.654654 | roll | T | 1 |

0.12567 | roll | T | 2 |

0.967946 | roll | C | 1 |

0.36758 | roll | C | 2 |

0.656765 | roll | C | 3 |

0.763147 | roll | C | 4 |

0.65445 | level | T | 1 |

0.516874 | level | T | 2 |

0.045648 | level | T | 3 |

0.989957 | level | T | 4 |

0.254658 | level | T | 5 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to michelconn

08-16-2016 06:44 PM

I am not really seeing how your look up table is involved. Look up implies that a value already in the data exists to match on.

What does you actual "input" data look like? It looks like you showed what you are thinking of as a look up table but I can't tell whether the actual input table consists of only a random number or something else.

It looks like you are somehow comparing a value of 0.654654 to 80??? or 20???

I suspect that your Prob2 is completely unneeded as Prob2 = 100-Prob1. Or to match you "randno" better would be .8 and then prob2 = 1 -0.8

If you don't have a CV1, CV2 or Seg for the first record how do you decide which record in the look up data to compare with?

Are there any other values than those shown in your PROBS dataset?

And I still don't see how you get " if observation one has a CVtype of T and segment number of 1 then there is a 75% chance observation two is T and 25% chance observations two is C." The second row of probs has 70 and 30. Are you now averaging things between rows??

Solution

08-18-2016
10:57 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to michelconn

08-16-2016 11:09 PM