BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_MooMoo
Obsidian | Level 7

Hello SAS communities!

 

I am unsure whether to use PROC FREQ with weight option or PROC SURVEYFREQ. 

The analytic data set I am working with is a subset of the master data set. We randomly select sample by age category from master data set and calculate weight using sampling fraction. For example, we have 1000 participants in the master data set, where 500 are below 50 and 500 are above 50. We sampled 100 participants from below 50 yrs old and sampled 50 participants from above 50 yrs old. We assigned a weight of 5 to below 50 yrs old group and assigned a weight of 10 to above 50 yrs old group. In this case, would we use PROC FREQ with weight option or PROC SURVEYFREQ with weight option and specifying age group as strata? 

Any advice is greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
SAS_Rob
SAS Employee

In general you would not want to place the strata variable on the TABLES statement so that is why it issues the ERROR message.  The stratum sample sizes are presumably fixed and so you would not use those as domains(which is essentially what adding a variable to the TABLES statement does).  If you are simply trying to get summary information, the creation of a new variable is the only way identical to the strata variable is the only way but again I am not sure you would want to get Chi-Square tests.

View solution in original post

5 REPLIES 5
ballardw
Super User

What is the purpose of subsetting the data to begin with?

Are you attempting to project the results from the sample to your "master" data only or to a larger population that you master could be attempting to represent?

 

How were those specific weights chosen for the age groups? (Hint: Proc surveyselect might save you some time)

 

One thing to consider with the size of your master data set is possibly a finite population correction. Many statistical approaches use an assumptions that the population your sample comes from is infinite, or in practical purposes very large. A population of 1000 is not very large and your sample of 100 represents a significant proportion of the data. So your sample actually tells you more about the master data then if you were pulling from 100,000 records.

 

The survey procedures, if you provide the sampling information or population total properly, will do a finite population adjustment.

 

_MooMoo
Obsidian | Level 7

@ballardw Thank you for your response! 

The purpose of subsetting the data is because of our limited resources. Instead, we chose some portion from master data and test them (primary reason is the cost of testing all of the samples are too high). In our result, we want to take into account for the fact that our analytic data is sampled from master data. 

 

SAS_Rob
SAS Employee

If you have a stratified random sample with sampling weights then you should use Proc SURVEYFREQ with a STRATA and WEIGHT statement.

_MooMoo
Obsidian | Level 7

This may be stupid question.. but I would like to generate a two way frequency table - sex by age group where age group is first stage strata.

I coded..

proc surveyfreq data=x;

   strata age_grp;

   tables age_grp * sex;

   weight wt;

run;

And  I got the warning saying that age_grp cannot appear both in strata and tables statement.

proc surveyfreq data=x;

   tables age_grp * sex;

   weight wt;

run;

If I just write above code, wouldn't SAS assume there is no stratification at the first stage?

Thank you!

SAS_Rob
SAS Employee

In general you would not want to place the strata variable on the TABLES statement so that is why it issues the ERROR message.  The stratum sample sizes are presumably fixed and so you would not use those as domains(which is essentially what adding a variable to the TABLES statement does).  If you are simply trying to get summary information, the creation of a new variable is the only way identical to the strata variable is the only way but again I am not sure you would want to get Chi-Square tests.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1951 views
  • 2 likes
  • 3 in conversation