Solved: generate a randum number (0,1) according to a predetermined % sample

progster · Posted 07-19-2013 07:29 AM

hi everyone.

i received a portfolio database of clients who received a loan. i know for a given month the total % of default (eg 5.0%), but i do not have the detail for every client.

i would like to generate for every client (row) a random number (0 or 1), with the "1" meaning a default situation, but i want also the total of "1" being the 5% of the total of rows.

thanks in advance for every suggestions.

Rick_SAS · Posted 07-19-2013 09:16 AM

There are two ways to interpret the "5%" criterion. The first way is to compute R=int(0.05*N) where N is the number of observations in your data set. Data_Null_ provides code to generate an indicator variable that has R selected rows. Notice that if you run PROC FREQ like this

proc freq data=samp;

tables selected;

run;

you will always get the same number of selected rows (R = int(0.05*428) = 22 in _NULL_'s example).

An alternative approach is to say that each observation has a 5% chance of being selected. This means that the number of selected observations is a random value with expected value 0.05*N. If you take this approach, every time you run the following program you get a different number of selected rows. The number of rows is binomially distributed. It's equivalent to using selected=rand("Binomial",0.05) in a DATA step.

proc surveyselect data=sashelp.cars

method=bernoulli rate=.05 out=samp outall;

run;

proc freq data=samp;

tables selected;

run;

If your eventual goal is to do some kind of bootstrap analysis, know that PROC SURVEYSELECT supports the REPS= option, which repeats this selecting process a specified number of times.

View solution in original post

data_null__ · Posted 07-19-2013 07:47 AM

The 0,1 will be called selected.
proc surveyselect data=sashelp.cars rate=.05 out=samp outall;
run;

Rick_SAS · Posted 07-19-2013 09:16 AM

There are two ways to interpret the "5%" criterion. The first way is to compute R=int(0.05*N) where N is the number of observations in your data set. Data_Null_ provides code to generate an indicator variable that has R selected rows. Notice that if you run PROC FREQ like this

proc freq data=samp;

tables selected;

run;

you will always get the same number of selected rows (R = int(0.05*428) = 22 in _NULL_'s example).

An alternative approach is to say that each observation has a 5% chance of being selected. This means that the number of selected observations is a random value with expected value 0.05*N. If you take this approach, every time you run the following program you get a different number of selected rows. The number of rows is binomially distributed. It's equivalent to using selected=rand("Binomial",0.05) in a DATA step.

proc surveyselect data=sashelp.cars

method=bernoulli rate=.05 out=samp outall;

run;

proc freq data=samp;

tables selected;

run;

If your eventual goal is to do some kind of bootstrap analysis, know that PROC SURVEYSELECT supports the REPS= option, which repeats this selecting process a specified number of times.

progster · Posted 07-19-2013 11:25 AM

dear rick, the advice that definitely match my goals is the second! one last doubt, since i saw that the sample size slightly change, this change is based on a confidence interval? thanks a lot!

Rick_SAS · Posted 07-19-2013 11:40 AM

The number of selected rows will be binomially distributed. The expected number of selected rows (the mean) will be N*p.

The standard deviation will be sqrt(N*p*(1-p)). When N is large, most of the sample sizes will be within three standard deviations of the mean.

generate a randum number (0,1) according to a predetermined % sample

Re: generate a randum number (0,1) according to a predetermined % sample

Re: generate a randum number (0,1) according to a predetermined % sample

Re: generate a randum number (0,1) according to a predetermined % sample

Re: generate a randum number (0,1) according to a predetermined % sample

Re: generate a randum number (0,1) according to a predetermined % sample

generate a randum number (0,1) according to a predetermined % sample

Re: generate a randum number (0,1) according to a predetermined % sample

Re: generate a randum number (0,1) according to a predetermined % sample

Re: generate a randum number (0,1) according to a predetermined % sample

Re: generate a randum number (0,1) according to a predetermined % sample

Re: generate a randum number (0,1) according to a predetermined % sample

Click image to register for webinar

Classroom Training Available!