BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
progster
Fluorite | Level 6

hi everyone.

i received a portfolio database of clients who received a loan. i know for a given month the total % of default (eg 5.0%), but i do not have the detail for every client.

i would like to generate for every client (row) a random number (0 or 1), with the "1" meaning a default situation, but i want also the total of "1" being the 5% of the total of rows.

thanks in advance for every suggestions.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

There are two ways to interpret the "5%" criterion. The first way is to compute R=int(0.05*N) where N is the number of observations in your data set. Data_Null_ provides code to generate an indicator variable that has R selected rows. Notice that if you run PROC FREQ like this

proc freq data=samp;

tables selected;

run;

you will always get the same number of selected rows (R = int(0.05*428) = 22 in _NULL_'s example).

An alternative approach is to say that each observation has a 5% chance of being selected. This means that the number of selected observations is a random value with expected value 0.05*N.  If you take this approach, every time you run the following program you get a different number of selected rows. The number of rows is binomially distributed. It's equivalent to using selected=rand("Binomial",0.05) in a DATA step.

proc surveyselect data=sashelp.cars

   method=bernoulli rate=.05 out=samp outall;

run;

proc freq data=samp;

tables selected;

run;

If your eventual goal is to do some kind of bootstrap analysis, know that PROC SURVEYSELECT supports the REPS= option, which repeats this selecting process a specified number of times.

View solution in original post

4 REPLIES 4
data_null__
Jade | Level 19

The 0,1 will be called selected.
proc surveyselect data=sashelp.cars rate=.05 out=samp outall;
  
run;

Rick_SAS
SAS Super FREQ

There are two ways to interpret the "5%" criterion. The first way is to compute R=int(0.05*N) where N is the number of observations in your data set. Data_Null_ provides code to generate an indicator variable that has R selected rows. Notice that if you run PROC FREQ like this

proc freq data=samp;

tables selected;

run;

you will always get the same number of selected rows (R = int(0.05*428) = 22 in _NULL_'s example).

An alternative approach is to say that each observation has a 5% chance of being selected. This means that the number of selected observations is a random value with expected value 0.05*N.  If you take this approach, every time you run the following program you get a different number of selected rows. The number of rows is binomially distributed. It's equivalent to using selected=rand("Binomial",0.05) in a DATA step.

proc surveyselect data=sashelp.cars

   method=bernoulli rate=.05 out=samp outall;

run;

proc freq data=samp;

tables selected;

run;

If your eventual goal is to do some kind of bootstrap analysis, know that PROC SURVEYSELECT supports the REPS= option, which repeats this selecting process a specified number of times.

progster
Fluorite | Level 6

dear rick, the advice that definitely match my goals is the second! one last doubt, since i saw that the sample size slightly change, this change is based on a confidence interval? thanks a lot!

Rick_SAS
SAS Super FREQ

The number of selected rows will be binomially distributed. The expected number of selected rows (the mean) will be N*p.

The standard deviation will be sqrt(N*p*(1-p)). When N is large, most of the sample sizes will be within three standard deviations of the mean.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1780 views
  • 6 likes
  • 3 in conversation