BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
progster
Fluorite | Level 6

hi everyone.

i received a portfolio database of clients who received a loan. i know for a given month the total % of default (eg 5.0%), but i do not have the detail for every client.

i would like to generate for every client (row) a random number (0 or 1), with the "1" meaning a default situation, but i want also the total of "1" being the 5% of the total of rows.

thanks in advance for every suggestions.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

There are two ways to interpret the "5%" criterion. The first way is to compute R=int(0.05*N) where N is the number of observations in your data set. Data_Null_ provides code to generate an indicator variable that has R selected rows. Notice that if you run PROC FREQ like this

proc freq data=samp;

tables selected;

run;

you will always get the same number of selected rows (R = int(0.05*428) = 22 in _NULL_'s example).

An alternative approach is to say that each observation has a 5% chance of being selected. This means that the number of selected observations is a random value with expected value 0.05*N.  If you take this approach, every time you run the following program you get a different number of selected rows. The number of rows is binomially distributed. It's equivalent to using selected=rand("Binomial",0.05) in a DATA step.

proc surveyselect data=sashelp.cars

   method=bernoulli rate=.05 out=samp outall;

run;

proc freq data=samp;

tables selected;

run;

If your eventual goal is to do some kind of bootstrap analysis, know that PROC SURVEYSELECT supports the REPS= option, which repeats this selecting process a specified number of times.

View solution in original post

4 REPLIES 4
data_null__
Jade | Level 19

The 0,1 will be called selected.
proc surveyselect data=sashelp.cars rate=.05 out=samp outall;
  
run;

Rick_SAS
SAS Super FREQ

There are two ways to interpret the "5%" criterion. The first way is to compute R=int(0.05*N) where N is the number of observations in your data set. Data_Null_ provides code to generate an indicator variable that has R selected rows. Notice that if you run PROC FREQ like this

proc freq data=samp;

tables selected;

run;

you will always get the same number of selected rows (R = int(0.05*428) = 22 in _NULL_'s example).

An alternative approach is to say that each observation has a 5% chance of being selected. This means that the number of selected observations is a random value with expected value 0.05*N.  If you take this approach, every time you run the following program you get a different number of selected rows. The number of rows is binomially distributed. It's equivalent to using selected=rand("Binomial",0.05) in a DATA step.

proc surveyselect data=sashelp.cars

   method=bernoulli rate=.05 out=samp outall;

run;

proc freq data=samp;

tables selected;

run;

If your eventual goal is to do some kind of bootstrap analysis, know that PROC SURVEYSELECT supports the REPS= option, which repeats this selecting process a specified number of times.

progster
Fluorite | Level 6

dear rick, the advice that definitely match my goals is the second! one last doubt, since i saw that the sample size slightly change, this change is based on a confidence interval? thanks a lot!

Rick_SAS
SAS Super FREQ

The number of selected rows will be binomially distributed. The expected number of selected rows (the mean) will be N*p.

The standard deviation will be sqrt(N*p*(1-p)). When N is large, most of the sample sizes will be within three standard deviations of the mean.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1713 views
  • 6 likes
  • 3 in conversation