BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS93
Quartz | Level 8

I've run into conflicting methods of limiting analysis; I work with complex survey data, which naturally deals with descriptive analysis and general measures of association. Since starting this particular job a year ago, I was under the impression that it's wrong to "limit" or subset datasets like NHANES, BRFSS, YRBSS, etc. because it messes up the way the dataset represents the entire population respondents are sampled from. Instead, I and others I've worked with have usually created dichotomous "limiting" variables that essentially separate the "excluded" and "included" groups, then we only look at the "included" output. 

 

A new co-worker that recently started did an analysis and (instead of creating the limiting variable), limited the dataset itself. The prevalence and all other estimates were the same--the only thing that differed was the p-values of the chi-square tests. 

 

What exactly about the data (or how SAS works) would make the p-values different? Or could've this been a random fluke in the program at the time--we've had some similar instances where every single measurement was the same, except the 95% CIs were slightly different between 2 or 3 people coding the same analysis. 

 

Anyway...what are your thoughts? Is there anything that I seem to be mislead on here?

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

P-values will change for almost any analysis that uses a subset of data.

Consider:

proc freq data=sashelp.class;
   tables age*sex/chisq;
run;

proc freq data=sashelp.class (obs=18);
   tables age*sex/chisq;
run;

Removing 1 observation in this set changes the p-value of the chi-square test from 1.4848 to 1.8667.

 

The main concern with the survey data limitations is that anything that actually uses the variance in calculations may be off considerably. The secondary concern is if you want to project your results back to the population such as saying something like "Approximately 68,000 individuals have condition X" . A subset of the data means that estimate would likely be way off because the sums of the weights do not any longer actually represent the original population.

 

The approach is to use DOMAIN analysis for the procs that support it directly. That will do "all" the values using the variance ans needed. Then you only look at the bits you are concerned with.

 

 

View solution in original post

2 REPLIES 2
ballardw
Super User

P-values will change for almost any analysis that uses a subset of data.

Consider:

proc freq data=sashelp.class;
   tables age*sex/chisq;
run;

proc freq data=sashelp.class (obs=18);
   tables age*sex/chisq;
run;

Removing 1 observation in this set changes the p-value of the chi-square test from 1.4848 to 1.8667.

 

The main concern with the survey data limitations is that anything that actually uses the variance in calculations may be off considerably. The secondary concern is if you want to project your results back to the population such as saying something like "Approximately 68,000 individuals have condition X" . A subset of the data means that estimate would likely be way off because the sums of the weights do not any longer actually represent the original population.

 

The approach is to use DOMAIN analysis for the procs that support it directly. That will do "all" the values using the variance ans needed. Then you only look at the bits you are concerned with.

 

 

SAS93
Quartz | Level 8

Thank you! That clears up my confusion. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 346 views
  • 2 likes
  • 2 in conversation