Hi,
I am using Proc SurveyFreq (SAS 9.4) to analyze a dataset of school-based screening results. The study design was a systematic stratified sample, with three strata: student race (white/non-white), school district size (<=2,500 vs. >2,500 students) and families above poverty level (<=40% vs. >40% above poverty level). The code below gives the screening percentages ("passed / failed") by student sex.
PROC SURVEYFREQ data=seven;
title2 'PROC SurveyFreq results - by sex';
STRATA white size pov; *Stratification variables for sampling frame;
CLUSTER school; *PSU;
WEIGHT weight2;
TABLES sex*result /row cl wchisq or;
RUN;
My question is how to get estimates for *just* for the stratum variables: yes/no white, etc. Just using Proc Freq with the weights doesn't seem right as it wouldn't take into account the sampling procedure. Can I just take (e.g.) "white" out of the strata statement and say TABLES white*result /row....? I'm probably overthinking this, but can't find anything online that seems definitive.
The variables would go on the tables statement.
However a variable that is used on a strata statement can't be used as such. So you would need to have duplicate variables in the data to place on a Tables statement (or in Proc Surveymeans a Domain statement). At least the last time I tried to put a strata variable on a tables statement it generated an error.
The variables would go on the tables statement.
However a variable that is used on a strata statement can't be used as such. So you would need to have duplicate variables in the data to place on a Tables statement (or in Proc Surveymeans a Domain statement). At least the last time I tried to put a strata variable on a tables statement it generated an error.
Thanks for answering! Such a simple idea. The overall percentage and its standard error from the one-way model (just "result" in the tables statement) are identical to the overall percentage and se in the 2-way table (e.g., "white * result"). So I'm going to consider that a success.
And yes, listing it on both the strata and tables statements generates an error.
Thanks so much!
If the stratum sample sizes are fixed, you can specify the stratification variables in a BY statement (instead of a STRATA statement) in PROC SURVEYFREQ to produce stratum-level estimates. (See last paragraph in doc section here.)
Interestingly, sorting and adding the By statement yields almost identical results as putting the duplicate variable in the tables statement. The percent and standard errors are the same, but the confidence intervals are ever so slightly wider (2.5777, 8.2148) vs. (2.6194, 8.1732).
Thanks for answering!
@DebbiBJ wrote:
Interestingly, sorting and adding the By statement yields almost identical results as putting the duplicate variable in the tables statement. The percent and standard errors are the same, but the confidence intervals are ever so slightly wider (2.5777, 8.2148) vs. (2.6194, 8.1732).
Thanks for answering!
From the Surveyfreq online documentation in the Details section for Domain Analysis (with some emphasis added by me):
Including domain variables in a TABLES statement request provides a different analysis from the analysis that you obtain by using a BY statement; a BY statement provides completely separate analyses of the BY groups. You can use a BY statement to analyze the data set by subgroups, but it is critical to note that this does not produce a valid domain analysis; the BY statement is appropriate only when the number of units in each subgroup is known with certainty. For example, you can use a BY statement to obtain stratum level estimates when the stratum sample sizes are fixed. But when the subgroup sample sizes are not fixed, you should perform domain analysis by including the domain variables in your TABLES statement request.
So, that part about knowing the subgroup sample sizes always confused me, which is why I didn't try the By statement route in the first place. But I guess it's just referring to modeling where you run multiple random subsamples of the population. So since I have a static sample size, the part "you can use a BY statement to obtain stratum level estimates when the stratum sample sizes are fixed" implies that this is a better method than the duplicate stratum variable? I liked the duplicate method because the confidence intervals were tighter, but I suppose that's the definition of temptation...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.