I'm analyzing data from the National Survey of College Graduates (NSF, 2019). I am only interested in reporting on descriptive data, specifically what the proportions of the population are in certain groups e.g. X% of the population has a PHD and is female and Hispanic, not confidence intervals or p-values. The file includes a "survey specific weight" variable, which I would apply to the analysis.
Question: The design of the survey is described as a "stratified sampling design" with stratification cells (exact description below from NSF site). If I am only interested in the proportions, do I need to account for stratification? Or is just applying the weight variable sufficient?
Full description: "Sample design. The NSCG uses a stratified sampling design to select its sample from the eligible sampling frame. Within the sampling strata, the NSCG uses probability proportional to size or ystematic random sampling techniques to select the NSCG sample. The stratification cells are defined by the following variables: • Demographic group • Highest degree type • Occupation field and bachelor’s degree field As has been the case since the 2013 NSCG, the 2019 NSCG includes an oversample of young graduates to improve the precision of estimates for this important population." - from NSF website
The weighted proportions themselves do not depend on the stratification variable so you can safely report them without including the stratification. The standard error and confidence intervals that will change however.
Of course, assuming you have the stratification information and given the ease of simply using the STRATA statement in Proc SURVEYFREQ, it is probably a good idea to include it anyway in case you want other measures.
Learning suggestion: Use the data from that source.
The run the analysis with and without strata and sample information. See what it does to proportions.
The weighted proportions themselves do not depend on the stratification variable so you can safely report them without including the stratification. The standard error and confidence intervals that will change however.
Of course, assuming you have the stratification information and given the ease of simply using the STRATA statement in Proc SURVEYFREQ, it is probably a good idea to include it anyway in case you want other measures.
Very helpful, thank you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.