BookmarkSubscribeRSS Feed
dhana
Fluorite | Level 6

Hi All -

I have a dataset which contains account number, balance, limit and apr. I have to separate out 10% population from this dataset. This 10% population  should be a (stratified) random sample from the dataset  and also the distribution of balance,limit and apr between 10% popluation and remaining 90% population should be equal ( approximately equal) .

I have used proc surveyselect procedure for sampling dataset based on one variable.

proc surveryselect data = dataset out=new_dsn samprate=.1 outall;

strata cust_flag;

run;

Can you some one help me how to do the samething for many variables.

Thanks

Dhana

4 REPLIES 4
Reeza
Super User

Why can't you add more variables to the strata statement?

strata balance limit apr;

dhana
Fluorite | Level 6

I tried to do the same , but instead of 10% I got 19% population. After seeing that I am little confused on how this proc works.

Astounding
PROC Star

I don't know how it works, but I do have a suspicion.  Perhaps the procedure requires every combination of strata variables to be represented in the sample.  If the number of observations fitting into a particular strata combination were 5, the software would still have to select one of them into the sample.  If that applied to every strata combination, you would end up with a 20% sample.  You could check the strata sizes with this sort of program:

proc freq data=have noprint;

   tables three*strata*variables / out=counts (keep=count rename=(count=n_observations));

run;

proc freq data=counts;

  tables n_observations;

run;

The final table would tell you how many strata combinations have 1 observation in the original data set, how many have 2 observations, etc.

Good luck.

ballardw
Super User

Could you post the code that generated the 19% sample? I did some experimenting and I get 10% within each combination of strata variables but my trial data is probably too nice.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 4816 views
  • 0 likes
  • 4 in conversation