turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Advanced problem for Strata Sampling issue

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-18-2015 01:14 PM

Hi All,

Recently my problem is resolved for adding option SELECTALL in strata sampling to include all obs within one strata.

The code is:

proc sort data=filein;

by var1;

proc surveyselect data=filein method=srs sampsize=20

seed=12345 out=fileout SELECTALL;

strata var1;

run;

My real problem is that:

Strata var1 Ideal sample Situation 1 Situation 2

value: size Real Want Real Want

A 20 10 10 10 10

B 20 30 25 15 15

C 20 50 25 50 35

60 60 60

I want the final sample size (include A,B and C) should be 60.

How do we code the logic like that ?

Thanks for your support information.

Regards.

William

Accepted Solutions

Solution

01-18-2015
04:13 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wtien196838

01-18-2015 04:13 PM

To get exactly what you want you will need to do some of the work yourself, as in this example based on sashelp.class :

/*Get the strata sizes, ordered by increasing size */

**proc sql;**

**create table strata as**

**select age, count(*) as real**

**from sashelp.class**

**group by age**

**order by real;**

**quit;**

/* Set the total sample size */

**%let totalSample=12;**

/* Calculate effective sample sizes by iteratively allocating samples equally among leftover strata, starting with the smallest stratum */

**data sizes;**

**retain sampleLeft (&totalSample);**

**set strata nobs=nStrata;**

**SampleSize = min(real, round(sampleLeft/(nStrata-_n_+1)));**

**sampleLeft + (-SampleSize);**

**drop sampleLeft;**

**run;**

/* Give the same stratum order to data and strata sizes, as required by proc surveyselect */

**proc sort data=sashelp.class out=class; by age; run;**

**proc sort data=sizes; by age; run;**

/* Call surveyselect to do the random sampling with the calculated strata sizes */

**proc surveyselect data=class out=mySample seed=85687 selectall sampsize=sizes;**

**strata age;**

**run;**

PG

Message was edited by: PG Added comments.

PG

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wtien196838

01-18-2015 02:29 PM

I'm not 100% sure what you're after but look at the SAMPSIZE= option for your version of proc surveyselect. You can either specify it in another dataset or list it out. The order needs to match the stratum order.

proc surveyselect data=filein method=srs **sampsize=(10 25 25)**

seed=12345 out=fileout SELECTALL;

strata var1;

run;

OR

proc surveyselect data=filein method=srs **sampsize=(20 20 20)**

seed=12345 out=fileout SELECTALL;

strata var1;

run;

Solution

01-18-2015
04:13 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wtien196838

01-18-2015 04:13 PM

To get exactly what you want you will need to do some of the work yourself, as in this example based on sashelp.class :

/*Get the strata sizes, ordered by increasing size */

**proc sql;**

**create table strata as**

**select age, count(*) as real**

**from sashelp.class**

**group by age**

**order by real;**

**quit;**

/* Set the total sample size */

**%let totalSample=12;**

/* Calculate effective sample sizes by iteratively allocating samples equally among leftover strata, starting with the smallest stratum */

**data sizes;**

**retain sampleLeft (&totalSample);**

**set strata nobs=nStrata;**

**SampleSize = min(real, round(sampleLeft/(nStrata-_n_+1)));**

**sampleLeft + (-SampleSize);**

**drop sampleLeft;**

**run;**

/* Give the same stratum order to data and strata sizes, as required by proc surveyselect */

**proc sort data=sashelp.class out=class; by age; run;**

**proc sort data=sizes; by age; run;**

/* Call surveyselect to do the random sampling with the calculated strata sizes */

**proc surveyselect data=class out=mySample seed=85687 selectall sampsize=sizes;**

**strata age;**

**run;**

PG

Message was edited by: PG Added comments.

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wtien196838

01-18-2015 09:56 PM

I explain my sampling method.

If those strata (A,B,C) have obs larger than sample size 20, then it is easy to use proc surveyexpect with sampsize=(20 20 20).

If A only has 10 obs then the missing 10 obs are distributed evenly to B and C, sample size should be=(10 25 25).

If A has 10 obs and B has 15 obs then the remain sample is on C, sample size should be=(10 15 35)

Since I need sampling 75 samples (60 obs/each sample) in a short period of time. I do not have time to check the number of obs in each strata.

I need to use SAS code to do this task.

Hope this explanation can clarify my question.

Thanks for your suggestion.

William

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wtien196838

01-18-2015 11:45 PM

Is the sample required always 60? Does @pgstats solution work?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wtien196838

01-19-2015 10:56 AM

I added comments to my code to explain how it implements your sampling method, exactly, for the **sashelp.class** dataset stratified by **age **with a total sample size of 12. - PG

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to wtien196838

01-20-2015 11:16 AM

It works. Thanks so much PGStats. Your code is genius. It took me more time to understand it and apply it. This is the code I want.

Thanks for all contributors again.

Regards,

William