Programming the statistical procedures from SAS

Stratified Sampling based on multiple variables

Reply
Frequent Contributor
Posts: 75

Stratified Sampling based on multiple variables

Hi All -

I have a dataset which contains account number, balance, limit and apr. I have to separate out 10% population from this dataset. This 10% population  should be a (stratified) random sample from the dataset  and also the distribution of balance,limit and apr between 10% popluation and remaining 90% population should be equal ( approximately equal) .

I have used proc surveyselect procedure for sampling dataset based on one variable.

proc surveryselect data = dataset out=new_dsn samprate=.1 outall;

strata cust_flag;

run;

Can you some one help me how to do the samething for many variables.

Thanks

Dhana

Grand Advisor
Posts: 16,875

Re: Stratified Sampling based on multiple variables

Why can't you add more variables to the strata statement?

strata balance limit apr;

Frequent Contributor
Posts: 75

Re: Stratified Sampling based on multiple variables

I tried to do the same , but instead of 10% I got 19% population. After seeing that I am little confused on how this proc works.

Respected Advisor
Posts: 4,757

Re: Stratified Sampling based on multiple variables

I don't know how it works, but I do have a suspicion.  Perhaps the procedure requires every combination of strata variables to be represented in the sample.  If the number of observations fitting into a particular strata combination were 5, the software would still have to select one of them into the sample.  If that applied to every strata combination, you would end up with a 20% sample.  You could check the strata sizes with this sort of program:

proc freq data=have noprint;

   tables three*strata*variables / out=counts (keep=count rename=(count=n_observations));

run;

proc freq data=counts;

  tables n_observations;

run;

The final table would tell you how many strata combinations have 1 observation in the original data set, how many have 2 observations, etc.

Good luck.

Grand Advisor
Posts: 10,043

Re: Stratified Sampling based on multiple variables

Could you post the code that generated the 19% sample? I did some experimenting and I get 10% within each combination of strata variables but my trial data is probably too nice.

Ask a Question
Discussion stats
  • 4 replies
  • 933 views
  • 0 likes
  • 4 in conversation