BookmarkSubscribeRSS Feed
wfung
Obsidian | Level 7

I have a set of data where I'm hoping to get some statistics on how stable people answer some questions over time. For example, here is the percentage of participants who answered Yes/No that they live in public housing. 

 

 Year20102012201320142015
No78%87%80%86%84%80%
Yes22%13%20%14%16%20%

 

So, it seems to be fairly stable, where similar percentage of people answered Yes across time, but I would like some kind of statistics to show it beyond just showing percentages. 

 

This is what the data looks like. Each year, the sample of participants is different (i.e., the same participants were not followed over time). 

Participant IDPublic HousingYear
112010
212010
312010
402010
512010
602012
702012
802012
902012
1012012
1112013
1202013
1312013
1412013
1512013
1612014
1702014
1802014
1912014
2002014
2102015
2202015
2312015
2412015
2512015

 

Anyone have ideas of what statistics I can use? Thank you in advance! 

7 REPLIES 7
Reeza
Super User
# of transitions out of public housing each year
% of transitions out of public housing each year
# of transitions into public housing each year (new)
% of public housing each year
% stayed the same. Those three metrics (which add to 1) should give you a starting point.
PGStats
Opal | Level 21

Two simple tests would be:

 

proc glimmix data=have;
class year;
model housing = year / dist=binary;
run;

proc freq data=have;
table year*housing / chisq;
run;
PG
wfung
Obsidian | Level 7

Thank you for the suggestions! I tried both.

Using GLIMMIX: 

Over the entire period, the Type III Tests of Fixed Effects show Year is significant, F=6.61, p<.0001. 

When I subset the data to the two years that are close in percentage of "Yes" then the results non-significant.

 

Using chi-square test: 

Over the entire period, Chi-square=62.48, p<.0001. 

When I subset the data to the two years that are close in percentage of "Yes" then the results non-significant.

 

Since we're talking about simple models, is SURVEYLOGISTIC appropriate? I ask because it will allow me to use strata and cluster information in the data. 

Using SURVEYLOGISTIC: 

The Analysis of Maximum Likelihood Estimates shows Year is non-significant, t=-1.14, p=.97

 

Thank you again! 

PGStats
Opal | Level 21

Looks like you had lots more information that presented in your original question. You don't give enough clues for us to guess why p < 0.0001 would become p = 0.97 when taking strata and clusters into account.

PG
wfung
Obsidian | Level 7

I apologize for that. Also, the p=.97 was a typo. I mistakenly ran the model on a different variable. 

Let me try again. Here are the three models I tried on the same Housing variable (0 or 1) and Year variable (2010-2015). Thanks for your patience. 

 

proc glimmix data=temp;
weight Weight;
class Year;
model housing (event="1")= Year / dist=binary;
run;

This results in p<.0001. 

 

proc surveylogistic data=temp ;
weight Weight;
model housing (Event='1') = Year;
run;

This results in p=.07

 

proc surveylogistic data=temp ;
strata Region Cycle;
cluster Cluster;
weight Weight;
model housing (Event='1') = Year;
run;

This results in p=.26

 

It seems like even without strata and cluster the results differ between GLIMMIX and SURVEYLOGISTIC. 

I found similar results with the other variables I'm looking at. That is, in most cases GLIMMIX would have significant p-value and SURVEYLOGISTIC (without strata or cluster) would have non-significant p-value. 

So, I'm trying to understand which one is more appropriate, GLIMMIX or SURVEYLOGISTIC. 

PGStats
Opal | Level 21

The main difference that I can spot is that you didn't specify YEAR as a class variable in surveylogistic. That changes the model entirely.

PG
wfung
Obsidian | Level 7

Thank you! Here's what the correctly specified SURVEYLOGISTIC model shows (without strata and cluster so I can compare to the GLIMMIX model). It's similar to the GLIMMIX model (i.e., both significant). 

proc surveylogistic data=temp;
weight Weight;
class Year;
model Housing (Event='1') = Year;
run;

 

Type 3 Analysis of Effects: 

     Year is significant, p=.02

Analysis of Maximum Likelihood Estimates

     Year 2010: p=.18

             2011: p=.29

             2012: p=.57

             2013: p=.96

             2014: p=.72

             2015: p=.40

 

I will need to put in strata and cluster for the final model as it is more accurate for my data. The results are: 

Type 3 Analysis of Effects: 

     Year is non-significant, p=.47

Analysis of Maximum Likelihood Estimates

     Year 2010: p=.31

             2011: p=.61

             2012: p=.57

             2013: p=.55

             2014: p=.82

             2015: p=.32

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 750 views
  • 4 likes
  • 3 in conversation