Obsidian | Level 7

## Stability of how people answered questions over time

I have a set of data where I'm hoping to get some statistics on how stable people answer some questions over time. For example, here is the percentage of participants who answered Yes/No that they live in public housing.

 Year 2010 2012 2013 2014 2015 No 78% 87% 80% 86% 84% 80% Yes 22% 13% 20% 14% 16% 20%

So, it seems to be fairly stable, where similar percentage of people answered Yes across time, but I would like some kind of statistics to show it beyond just showing percentages.

This is what the data looks like. Each year, the sample of participants is different (i.e., the same participants were not followed over time).

 Participant ID Public Housing Year 1 1 2010 2 1 2010 3 1 2010 4 0 2010 5 1 2010 6 0 2012 7 0 2012 8 0 2012 9 0 2012 10 1 2012 11 1 2013 12 0 2013 13 1 2013 14 1 2013 15 1 2013 16 1 2014 17 0 2014 18 0 2014 19 1 2014 20 0 2014 21 0 2015 22 0 2015 23 1 2015 24 1 2015 25 1 2015

Anyone have ideas of what statistics I can use? Thank you in advance!

7 REPLIES 7
Super User

## Re: Stability of how people answered questions over time

# of transitions out of public housing each year
% of transitions out of public housing each year
# of transitions into public housing each year (new)
% of public housing each year
% stayed the same. Those three metrics (which add to 1) should give you a starting point.
Opal | Level 21

## Re: Stability of how people answered questions over time

Two simple tests would be:

``````proc glimmix data=have;
class year;
model housing = year / dist=binary;
run;

proc freq data=have;
table year*housing / chisq;
run;``````
PG
Obsidian | Level 7

## Re: Stability of how people answered questions over time

Thank you for the suggestions! I tried both.

Using GLIMMIX:

Over the entire period, the Type III Tests of Fixed Effects show Year is significant, F=6.61, p<.0001.

When I subset the data to the two years that are close in percentage of "Yes" then the results non-significant.

Using chi-square test:

Over the entire period, Chi-square=62.48, p<.0001.

When I subset the data to the two years that are close in percentage of "Yes" then the results non-significant.

Since we're talking about simple models, is SURVEYLOGISTIC appropriate? I ask because it will allow me to use strata and cluster information in the data.

Using SURVEYLOGISTIC:

The Analysis of Maximum Likelihood Estimates shows Year is non-significant, t=-1.14, p=.97

Thank you again!

Opal | Level 21

## Re: Stability of how people answered questions over time

Looks like you had lots more information that presented in your original question. You don't give enough clues for us to guess why p < 0.0001 would become p = 0.97 when taking strata and clusters into account.

PG
Obsidian | Level 7

## Re: Stability of how people answered questions over time

I apologize for that. Also, the p=.97 was a typo. I mistakenly ran the model on a different variable.

Let me try again. Here are the three models I tried on the same Housing variable (0 or 1) and Year variable (2010-2015). Thanks for your patience.

proc glimmix data=temp;
weight Weight;
class Year;
model housing (event="1")= Year / dist=binary;
run;

This results in p<.0001.

proc surveylogistic data=temp ;
weight Weight;
model housing (Event='1') = Year;
run;

This results in p=.07

proc surveylogistic data=temp ;
strata Region Cycle;
cluster Cluster;
weight Weight;
model housing (Event='1') = Year;
run;

This results in p=.26

It seems like even without strata and cluster the results differ between GLIMMIX and SURVEYLOGISTIC.

I found similar results with the other variables I'm looking at. That is, in most cases GLIMMIX would have significant p-value and SURVEYLOGISTIC (without strata or cluster) would have non-significant p-value.

So, I'm trying to understand which one is more appropriate, GLIMMIX or SURVEYLOGISTIC.

Opal | Level 21

## Re: Stability of how people answered questions over time

The main difference that I can spot is that you didn't specify YEAR as a class variable in surveylogistic. That changes the model entirely.

PG
Obsidian | Level 7

## Re: Stability of how people answered questions over time

Thank you! Here's what the correctly specified SURVEYLOGISTIC model shows (without strata and cluster so I can compare to the GLIMMIX model). It's similar to the GLIMMIX model (i.e., both significant).

proc surveylogistic data=temp;
weight Weight;
class Year;
model Housing (Event='1') = Year;
run;

Type 3 Analysis of Effects:

Year is significant, p=.02

Analysis of Maximum Likelihood Estimates

Year 2010: p=.18

2011: p=.29

2012: p=.57

2013: p=.96

2014: p=.72

2015: p=.40

I will need to put in strata and cluster for the final model as it is more accurate for my data. The results are:

Type 3 Analysis of Effects:

Year is non-significant, p=.47

Analysis of Maximum Likelihood Estimates

Year 2010: p=.31

2011: p=.61

2012: p=.57

2013: p=.55

2014: p=.82

2015: p=.32

Discussion stats
• 7 replies
• 752 views
• 4 likes
• 3 in conversation