## Random

Regular Contributor
Posts: 233

# Random

I need help in coding the below scenario. I have to select only 30% of col1 population when col1 = (0,1,2)  and 70% of population when col1 = (3,4,5,6,7)

DATA TEMP1;

SET TEMP;

IF (col1 in (0,1,2) and ranuni(111) < 0.7 )

and (col1 in (3,4,5,6,7) and ranuni(111) < 0.3)

then seg = 'test';

run;

I think I am missing something and I am not getting the results that I am looking for. If some one can help me, it would be awesome.

Super User
Posts: 6,771

## Re: Random

Hima,

Not sure if it's a typo or you just need a set of fresh eyes to look at this ...

The second AND should be OR.

Also note, random number streams can be tricky.  I forget all the rules, but it is quite possible that the two ranuni(111) are generating different numbers.  If you want to make sure you are comparing to the same random number, generate it first:

newvar = ranuni(111);

Then refer to newvar in the IF statement.

Finally, it looks like you have used < when you meant to use >.

Good luck.

Super User
Posts: 6,771

## Re: Random

Here's a small test program I wrote to illustrate the issue with random numbers:

data test;

do i=1 to 1000;

if ranuni(111) < 0.5 or ranuni(111) >= 0.5 then output;

end;

run;

You won't get 1000 records in the output data set, because the two ranuni functions generate different streams of numbers.

Posts: 5,526

## Re: Random

You could avoid the intricacies of random number generation by using the sample selection proc :

/* Specify the sampling rates for each strata */

data rates;
input col1 _rate_ @@;
datalines;
0 .3  1 .3  2 .3  3 .7  4 .7  5 .7  6 .7  7 .7
;

/* Sort the data by strata */

proc sort data=temp; by col1; run;

/* Extract the sample (variable Selected=1 for selected observations).

Remove option outall to retain only selected observations. */

proc surveyselect data=temp rate=rates method=srs out=temp1 outall;
strata col1;
run;

PG

PG
Discussion stats
• 3 replies
• 182 views
• 6 likes
• 3 in conversation