generate a randum number (0,1) according to a predetermined % sample

Accepted Solution Solved
Reply
Contributor
Posts: 27
Accepted Solution

generate a randum number (0,1) according to a predetermined % sample

hi everyone.

i received a portfolio database of clients who received a loan. i know for a given month the total % of default (eg 5.0%), but i do not have the detail for every client.

i would like to generate for every client (row) a random number (0 or 1), with the "1" meaning a default situation, but i want also the total of "1" being the 5% of the total of rows.

thanks in advance for every suggestions.


Accepted Solutions
Solution
‎07-19-2013 09:16 AM
SAS Super FREQ
Posts: 3,475

Re: generate a randum number (0,1) according to a predetermined % sample

There are two ways to interpret the "5%" criterion. The first way is to compute R=int(0.05*N) where N is the number of observations in your data set. Data_Null_ provides code to generate an indicator variable that has R selected rows. Notice that if you run PROC FREQ like this

proc freq data=samp;

tables selected;

run;

you will always get the same number of selected rows (R = int(0.05*428) = 22 in _NULL_'s example).

An alternative approach is to say that each observation has a 5% chance of being selected. This means that the number of selected observations is a random value with expected value 0.05*N.  If you take this approach, every time you run the following program you get a different number of selected rows. The number of rows is binomially distributed. It's equivalent to using selected=rand("Binomial",0.05) in a DATA step.

proc surveyselect data=sashelp.cars

   method=bernoulli rate=.05 out=samp outall;

run;

proc freq data=samp;

tables selected;

run;

If your eventual goal is to do some kind of bootstrap analysis, know that PROC SURVEYSELECT supports the REPS= option, which repeats this selecting process a specified number of times.

View solution in original post


All Replies
Respected Advisor
Posts: 3,777

Re: generate a randum number (0,1) according to a predetermined % sample

The 0,1 will be called selected.
proc surveyselect data=sashelp.cars rate=.05 out=samp outall;
  
run;

Solution
‎07-19-2013 09:16 AM
SAS Super FREQ
Posts: 3,475

Re: generate a randum number (0,1) according to a predetermined % sample

There are two ways to interpret the "5%" criterion. The first way is to compute R=int(0.05*N) where N is the number of observations in your data set. Data_Null_ provides code to generate an indicator variable that has R selected rows. Notice that if you run PROC FREQ like this

proc freq data=samp;

tables selected;

run;

you will always get the same number of selected rows (R = int(0.05*428) = 22 in _NULL_'s example).

An alternative approach is to say that each observation has a 5% chance of being selected. This means that the number of selected observations is a random value with expected value 0.05*N.  If you take this approach, every time you run the following program you get a different number of selected rows. The number of rows is binomially distributed. It's equivalent to using selected=rand("Binomial",0.05) in a DATA step.

proc surveyselect data=sashelp.cars

   method=bernoulli rate=.05 out=samp outall;

run;

proc freq data=samp;

tables selected;

run;

If your eventual goal is to do some kind of bootstrap analysis, know that PROC SURVEYSELECT supports the REPS= option, which repeats this selecting process a specified number of times.

Contributor
Posts: 27

Re: generate a randum number (0,1) according to a predetermined % sample

dear rick, the advice that definitely match my goals is the second! one last doubt, since i saw that the sample size slightly change, this change is based on a confidence interval? thanks a lot!

SAS Super FREQ
Posts: 3,475

Re: generate a randum number (0,1) according to a predetermined % sample

The number of selected rows will be binomially distributed. The expected number of selected rows (the mean) will be N*p.

The standard deviation will be sqrt(N*p*(1-p)). When N is large, most of the sample sizes will be within three standard deviations of the mean.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 319 views
  • 6 likes
  • 3 in conversation