I've run into conflicting methods of limiting analysis; I work with complex survey data, which naturally deals with descriptive analysis and general measures of association. Since starting this particular job a year ago, I was under the impression that it's wrong to "limit" or subset datasets like NHANES, BRFSS, YRBSS, etc. because it messes up the way the dataset represents the entire population respondents are sampled from. Instead, I and others I've worked with have usually created dichotomous "limiting" variables that essentially separate the "excluded" and "included" groups, then we only look at the "included" output.
A new co-worker that recently started did an analysis and (instead of creating the limiting variable), limited the dataset itself. The prevalence and all other estimates were the same--the only thing that differed was the p-values of the chi-square tests.
What exactly about the data (or how SAS works) would make the p-values different? Or could've this been a random fluke in the program at the time--we've had some similar instances where every single measurement was the same, except the 95% CIs were slightly different between 2 or 3 people coding the same analysis.
Anyway...what are your thoughts? Is there anything that I seem to be mislead on here?
... View more