03-06-2012 09:30 AM
I need help in coding the below scenario. I have to select only 30% of col1 population when col1 = (0,1,2) and 70% of population when col1 = (3,4,5,6,7)
IF (col1 in (0,1,2) and ranuni(111) < 0.7 )and (col1 in (3,4,5,6,7) and ranuni(111) < 0.3)
then seg = 'test';
I think I am missing something and I am not getting the results that I am looking for. If some one can help me, it would be awesome.
03-06-2012 09:54 AM
Not sure if it's a typo or you just need a set of fresh eyes to look at this ...
The second AND should be OR.
Also note, random number streams can be tricky. I forget all the rules, but it is quite possible that the two ranuni(111) are generating different numbers. If you want to make sure you are comparing to the same random number, generate it first:
newvar = ranuni(111);
Then refer to newvar in the IF statement.
Finally, it looks like you have used < when you meant to use >.
03-06-2012 10:14 AM
Here's a small test program I wrote to illustrate the issue with random numbers:
do i=1 to 1000;
if ranuni(111) < 0.5 or ranuni(111) >= 0.5 then output;
You won't get 1000 records in the output data set, because the two ranuni functions generate different streams of numbers.
03-06-2012 11:37 AM
You could avoid the intricacies of random number generation by using the sample selection proc :
/* Specify the sampling rates for each strata */
input col1 _rate_ @@;
0 .3 1 .3 2 .3 3 .7 4 .7 5 .7 6 .7 7 .7
/* Sort the data by strata */
proc sort data=temp; by col1; run;
/* Extract the sample (variable Selected=1 for selected observations).
Remove option outall to retain only selected observations. */
proc surveyselect data=temp rate=rates method=srs out=temp1 outall;