I have a set of data where I'm hoping to get some statistics on how stable people answer some questions over time. For example, here is the percentage of participants who answered Yes/No that they live in public housing.
Year | 2010 | 2012 | 2013 | 2014 | 2015 | |
No | 78% | 87% | 80% | 86% | 84% | 80% |
Yes | 22% | 13% | 20% | 14% | 16% | 20% |
So, it seems to be fairly stable, where similar percentage of people answered Yes across time, but I would like some kind of statistics to show it beyond just showing percentages.
This is what the data looks like. Each year, the sample of participants is different (i.e., the same participants were not followed over time).
Participant ID | Public Housing | Year |
1 | 1 | 2010 |
2 | 1 | 2010 |
3 | 1 | 2010 |
4 | 0 | 2010 |
5 | 1 | 2010 |
6 | 0 | 2012 |
7 | 0 | 2012 |
8 | 0 | 2012 |
9 | 0 | 2012 |
10 | 1 | 2012 |
11 | 1 | 2013 |
12 | 0 | 2013 |
13 | 1 | 2013 |
14 | 1 | 2013 |
15 | 1 | 2013 |
16 | 1 | 2014 |
17 | 0 | 2014 |
18 | 0 | 2014 |
19 | 1 | 2014 |
20 | 0 | 2014 |
21 | 0 | 2015 |
22 | 0 | 2015 |
23 | 1 | 2015 |
24 | 1 | 2015 |
25 | 1 | 2015 |
Anyone have ideas of what statistics I can use? Thank you in advance!
Two simple tests would be:
proc glimmix data=have;
class year;
model housing = year / dist=binary;
run;
proc freq data=have;
table year*housing / chisq;
run;
Thank you for the suggestions! I tried both.
Using GLIMMIX:
Over the entire period, the Type III Tests of Fixed Effects show Year is significant, F=6.61, p<.0001.
When I subset the data to the two years that are close in percentage of "Yes" then the results non-significant.
Using chi-square test:
Over the entire period, Chi-square=62.48, p<.0001.
When I subset the data to the two years that are close in percentage of "Yes" then the results non-significant.
Since we're talking about simple models, is SURVEYLOGISTIC appropriate? I ask because it will allow me to use strata and cluster information in the data.
Using SURVEYLOGISTIC:
The Analysis of Maximum Likelihood Estimates shows Year is non-significant, t=-1.14, p=.97
Thank you again!
Looks like you had lots more information that presented in your original question. You don't give enough clues for us to guess why p < 0.0001 would become p = 0.97 when taking strata and clusters into account.
I apologize for that. Also, the p=.97 was a typo. I mistakenly ran the model on a different variable.
Let me try again. Here are the three models I tried on the same Housing variable (0 or 1) and Year variable (2010-2015). Thanks for your patience.
proc glimmix data=temp;
weight Weight;
class Year;
model housing (event="1")= Year / dist=binary;
run;
This results in p<.0001.
proc surveylogistic data=temp ;
weight Weight;
model housing (Event='1') = Year;
run;
This results in p=.07
proc surveylogistic data=temp ;
strata Region Cycle;
cluster Cluster;
weight Weight;
model housing (Event='1') = Year;
run;
This results in p=.26
It seems like even without strata and cluster the results differ between GLIMMIX and SURVEYLOGISTIC.
I found similar results with the other variables I'm looking at. That is, in most cases GLIMMIX would have significant p-value and SURVEYLOGISTIC (without strata or cluster) would have non-significant p-value.
So, I'm trying to understand which one is more appropriate, GLIMMIX or SURVEYLOGISTIC.
The main difference that I can spot is that you didn't specify YEAR as a class variable in surveylogistic. That changes the model entirely.
Thank you! Here's what the correctly specified SURVEYLOGISTIC model shows (without strata and cluster so I can compare to the GLIMMIX model). It's similar to the GLIMMIX model (i.e., both significant).
proc surveylogistic data=temp;
weight Weight;
class Year;
model Housing (Event='1') = Year;
run;
Type 3 Analysis of Effects:
Year is significant, p=.02
Analysis of Maximum Likelihood Estimates
Year 2010: p=.18
2011: p=.29
2012: p=.57
2013: p=.96
2014: p=.72
2015: p=.40
I will need to put in strata and cluster for the final model as it is more accurate for my data. The results are:
Type 3 Analysis of Effects:
Year is non-significant, p=.47
Analysis of Maximum Likelihood Estimates
Year 2010: p=.31
2011: p=.61
2012: p=.57
2013: p=.55
2014: p=.82
2015: p=.32
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.