BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fedeava
Calcite | Level 5

Hello everyone,

I need to create a stratified sample starting from a wider population. But in creating the sample I have the constraint that I can choose observations only on a subset of the entire population. However, at the end the stratification variables should reflect the same proportions of the whole population and not of the population subset. I hope I explained.

I have tried to use the proc surveyselect with the where condition but the final proportions don't reflect the whole population.

 

proc surveyselect data=pop_new_14_17_stima (where=(FLAG_INVIO_BFD=1)) method=sys rate=0.9
seed=1953 out=camp_new rep=1 ;
strata DAT_FINE_PERIO tipo_cli_ps DEFAULT;
run;

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @fedeava and welcome to the SAS Support Communities!

 

I think the ALLOC= option of the STRATA statement is suitable for this purpose.

 

Here's an example:

Let's say we want to draw an 80% sample (without replacement) from SASHELP.HEART, restricted to non-smokers (n=2501, selection criterion: smoking_status=:'N'), but strive for a distribution of blood pressure status (BP_Status) as in the unrestricted dataset, i.e. (PROC FREQ output):

                    Blood Pressure Status

                                      Cumulative    Cumulative
BP_Status    Frequency     Percent     Frequency      Percent
--------------------------------------------------------------
High             2267       43.52          2267        43.52
Normal           2143       41.14          4410        84.66
Optimal           799       15.34          5209       100.00
/* Store stratum allocation proportions in a dataset STRATA */

proc freq data=sashelp.heart noprint;
tables bp_status / out=strata(drop=count rename=(percent=_alloc_));
run;

/* Select the subset (input dataset for PROC SURVEYSELECT) */

proc sort data=sashelp.heart out=restrpop;
where smoking_status=:'N';
by bp_status;
run;

/* Draw the random sample */

proc surveyselect data=restrpop
method=srs rate=0.8
seed=2718 out=want;
strata bp_status / alloc=strata;
run;

/* Check distribution of BP_Status */

proc freq data=want;
tables bp_status;
run;

Result:

                    Blood Pressure Status

                                      Cumulative    Cumulative
BP_Status    Frequency     Percent     Frequency      Percent
--------------------------------------------------------------
High              871       43.53           871        43.53
Normal            823       41.13          1694        84.66
Optimal           307       15.34          2001       100.00

Note that, e.g., with rate=0.9 the stratum sample size for stratum BP_Status='Optimal' would (necessarily) be capped at the stratum total, leading to not strictly proportional sample size allocation (see notes in the SAS log in this case).

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @fedeava and welcome to the SAS Support Communities!

 

I think the ALLOC= option of the STRATA statement is suitable for this purpose.

 

Here's an example:

Let's say we want to draw an 80% sample (without replacement) from SASHELP.HEART, restricted to non-smokers (n=2501, selection criterion: smoking_status=:'N'), but strive for a distribution of blood pressure status (BP_Status) as in the unrestricted dataset, i.e. (PROC FREQ output):

                    Blood Pressure Status

                                      Cumulative    Cumulative
BP_Status    Frequency     Percent     Frequency      Percent
--------------------------------------------------------------
High             2267       43.52          2267        43.52
Normal           2143       41.14          4410        84.66
Optimal           799       15.34          5209       100.00
/* Store stratum allocation proportions in a dataset STRATA */

proc freq data=sashelp.heart noprint;
tables bp_status / out=strata(drop=count rename=(percent=_alloc_));
run;

/* Select the subset (input dataset for PROC SURVEYSELECT) */

proc sort data=sashelp.heart out=restrpop;
where smoking_status=:'N';
by bp_status;
run;

/* Draw the random sample */

proc surveyselect data=restrpop
method=srs rate=0.8
seed=2718 out=want;
strata bp_status / alloc=strata;
run;

/* Check distribution of BP_Status */

proc freq data=want;
tables bp_status;
run;

Result:

                    Blood Pressure Status

                                      Cumulative    Cumulative
BP_Status    Frequency     Percent     Frequency      Percent
--------------------------------------------------------------
High              871       43.53           871        43.53
Normal            823       41.13          1694        84.66
Optimal           307       15.34          2001       100.00

Note that, e.g., with rate=0.9 the stratum sample size for stratum BP_Status='Optimal' would (necessarily) be capped at the stratum total, leading to not strictly proportional sample size allocation (see notes in the SAS log in this case).

fedeava
Calcite | Level 5

Wonderful!

This was exactly the help I was looking for. I was able to create a stratified sample on 3 drivers.... I'll probably have to add further drivers... I hope it continues to perform..

 

Thanks again!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 627 views
  • 4 likes
  • 2 in conversation