BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Lmv323
Calcite | Level 5
Trying to generate a random sample of employee level data for a survey. Want to ensure I have a minimum of 75 people in each sub group like Gender, line of business, age band etc.
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

What is your population size?

 

Generally if you want something representative of a population as a whole then a simple random sample without constraints of sufficient size would work. But if your desired sizes of specific characteristics require disproportionate sampling then you may have to go to stratification on one or more of the characteristics.

 

If this were my project I would likely start with a random sample of around 200 and examine the characteristics of those selected (Proc freq anyone). If that works, we're golden. If I'm close to the desired sizes in the characteristics then increase the sample size a bit.

If one of the characteristics doesn't get close at all then stratification on that variable and with multiple constraints the strata size would need to be larger.

 

 

View solution in original post

11 REPLIES 11
Reeza
Super User

In each group individually, or in total?

 

ie 75 females age 20-30

or 75 females

75 age 20-30

75 males

 

Either way, take a look at PROC SURVEYSELECT

Lmv323
Calcite | Level 5
75 female/75 males
75 age 20-30, 75 age 40-50
Not crossed with each other
Lmv323
Calcite | Level 5
That's a mimum
Ksharp
Super User
Is it what you are looking for ?



proc sort data=sashelp.class out=class;
by sex;
run;
proc surveyselect data=class nmin=8 samprate=.1 out=want;
strata sex;
run;

PGStats
Opal | Level 21

can the same employee be selected in a gender sample group and an age band sample group?

PG
PGStats
Opal | Level 21

Assuming the answer is yes, here is a method for choosing 2 students per sex and age group:

 

%macro select(dsn, id, crit, nbSel);
proc sort data=&dsn; by &crit; run;

proc sql;
create table strata_&crit as
select 
    &crit, 
    max(0, &nbSel - sum(selected)) as SampleSize
from &dsn
group by &crit;
quit;

proc surveyselect data=&dsn out=sample_&crit sampsize=strata_&crit;
where not selected;
strata &crit;
run;

proc sql;
update &dsn 
set selected = 1 
where &id in (select &id from sample_&crit);
quit;
%mend;

data class;
set sashelp.class;
selected = 0; /*Add this variable to the dataset */
run;


%select(class, name, sex, 2);
%select(class, name, age, 2);
PG
Lmv323
Calcite | Level 5
I do not want to limit myself to just a select number from each group. I want a sample that is representative of the employee base through the lense of line of business, gender, age, tenure, etc. just looking to ensure I have a minimum number at the least in each group to be able to confidently say women would prefer this over men. Or age band 20-30 is more likely to use the offering we are surveying on
Reeza
Super User

Is 75 from a power analysis?

Lmv323
Calcite | Level 5
It's based on a response rate assumption. Want to be left with enough sample to see difference in my groups for likert responses differences to be significant
ballardw
Super User

What is your population size?

 

Generally if you want something representative of a population as a whole then a simple random sample without constraints of sufficient size would work. But if your desired sizes of specific characteristics require disproportionate sampling then you may have to go to stratification on one or more of the characteristics.

 

If this were my project I would likely start with a random sample of around 200 and examine the characteristics of those selected (Proc freq anyone). If that works, we're golden. If I'm close to the desired sizes in the characteristics then increase the sample size a bit.

If one of the characteristics doesn't get close at all then stratification on that variable and with multiple constraints the strata size would need to be larger.

 

 

Lmv323
Calcite | Level 5
I ultimately needed to use a combination of solutions! Thank you everyone for your advice!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2541 views
  • 0 likes
  • 5 in conversation