05-26-2016 04:45 PM
I have data for two survey years, both separately represent the entire US population for that year after applying weights. Since they both represent the entire US population I don't think it is appropriate to stack them and treat them as independent samples for the purpose of comparing years. For example, a chi square test with year as one of the variables or regression with year as a predictor variable. Please let me know what you think.
05-26-2016 05:21 PM
Assuming the samples are independent (the folks in the second year were selected due to something from the previous year) treating the year as an independent variable or category is done quite often. The major concern is if the sample is a complex sample, which many of the national surveys are, that the appropriate sample information is provided to the analysis procedure. Hint: Proc Freq is most likely not appropriate. Look at Procs SurveyFreq SurveyMean SurveyLogistic and SurveyReg.
05-31-2016 06:53 PM
I am using the Survey Procs. I end up with an estimated population size that is double the US population.
I am not sure what you mean by this “(the folks in the second year were selected due to something from the previous year)”. I assume they used the same sampling plan both years. I am guessing the samples were independent in that they probably did not end up with the same respondents for both years, but the populations they represent are not independent in that most of the same people who are living in the US in one year are still living there in the next year (minus births, deaths, migration, etc.). Is it only the independence of the samples that’s important? Or should the populations also be independent? Thank you.