Stability of how people answered questions over time

wfung · Posted 06-27-2019 04:31 PM

I have a set of data where I'm hoping to get some statistics on how stable people answer some questions over time. For example, here is the percentage of participants who answered Yes/No that they live in public housing.

	Year	2010	2012	2013	2014	2015
No	78%	87%	80%	86%	84%	80%
Yes	22%	13%	20%	14%	16%	20%

So, it seems to be fairly stable, where similar percentage of people answered Yes across time, but I would like some kind of statistics to show it beyond just showing percentages.

This is what the data looks like. Each year, the sample of participants is different (i.e., the same participants were not followed over time).

Participant ID	Public Housing	Year
1	1	2010
2	1	2010
3	1	2010
4	0	2010
5	1	2010
6	0	2012
7	0	2012
8	0	2012
9	0	2012
10	1	2012
11	1	2013
12	0	2013
13	1	2013
14	1	2013
15	1	2013
16	1	2014
17	0	2014
18	0	2014
19	1	2014
20	0	2014
21	0	2015
22	0	2015
23	1	2015
24	1	2015
25	1	2015

Anyone have ideas of what statistics I can use? Thank you in advance!

Reeza · Posted 06-27-2019 08:02 PM

# of transitions out of public housing each year
% of transitions out of public housing each year
# of transitions into public housing each year (new)
% of public housing each year
% stayed the same. Those three metrics (which add to 1) should give you a starting point.

PGStats · Posted 06-27-2019 11:20 PM

Two simple tests would be:

proc glimmix data=have;
class year;
model housing = year / dist=binary;
run;

proc freq data=have;
table year*housing / chisq;
run;

PG

wfung · Posted 07-02-2019 10:31 AM

Thank you for the suggestions! I tried both.

Using GLIMMIX:

Over the entire period, the Type III Tests of Fixed Effects show Year is significant, F=6.61, p<.0001.

When I subset the data to the two years that are close in percentage of "Yes" then the results non-significant.

Using chi-square test:

Over the entire period, Chi-square=62.48, p<.0001.

When I subset the data to the two years that are close in percentage of "Yes" then the results non-significant.

Since we're talking about simple models, is SURVEYLOGISTIC appropriate? I ask because it will allow me to use strata and cluster information in the data.

Using SURVEYLOGISTIC:

The Analysis of Maximum Likelihood Estimates shows Year is non-significant, t=-1.14, p=.97

Thank you again!

PGStats · Posted 07-02-2019 01:50 PM

Looks like you had lots more information that presented in your original question. You don't give enough clues for us to guess why p < 0.0001 would become p = 0.97 when taking strata and clusters into account.

PG

wfung · Posted 07-02-2019 02:12 PM

I apologize for that. Also, the p=.97 was a typo. I mistakenly ran the model on a different variable.

Let me try again. Here are the three models I tried on the same Housing variable (0 or 1) and Year variable (2010-2015). Thanks for your patience.

proc glimmix data=temp;
weight Weight;
class Year;
model housing (event="1")= Year / dist=binary;
run;

This results in p<.0001.

proc surveylogistic data=temp ;
weight Weight;
model housing (Event='1') = Year;
run;

This results in p=.07

proc surveylogistic data=temp ;
strata Region Cycle;
cluster Cluster;
weight Weight;
model housing (Event='1') = Year;
run;

This results in p=.26

It seems like even without strata and cluster the results differ between GLIMMIX and SURVEYLOGISTIC.

I found similar results with the other variables I'm looking at. That is, in most cases GLIMMIX would have significant p-value and SURVEYLOGISTIC (without strata or cluster) would have non-significant p-value.

So, I'm trying to understand which one is more appropriate, GLIMMIX or SURVEYLOGISTIC.

PGStats · Posted 07-02-2019 02:22 PM

The main difference that I can spot is that you didn't specify YEAR as a class variable in surveylogistic. That changes the model entirely.

PG

wfung · Posted 07-02-2019 03:56 PM

Thank you! Here's what the correctly specified SURVEYLOGISTIC model shows (without strata and cluster so I can compare to the GLIMMIX model). It's similar to the GLIMMIX model (i.e., both significant).

proc surveylogistic data=temp;
weight Weight;
class Year;
model Housing (Event='1') = Year;
run;

Type 3 Analysis of Effects:

Year is significant, p=.02

Analysis of Maximum Likelihood Estimates

Year 2010: p=.18

2011: p=.29

2012: p=.57

2013: p=.96

2014: p=.72

2015: p=.40

I will need to put in strata and cluster for the final model as it is more accurate for my data. The results are:

Type 3 Analysis of Effects:

Year is non-significant, p=.47

Analysis of Maximum Likelihood Estimates

Year 2010: p=.31

2011: p=.61

2012: p=.57

2013: p=.55

2014: p=.82

2015: p=.32

Stability of how people answered questions over time

Re: Stability of how people answered questions over time

Re: Stability of how people answered questions over time

Re: Stability of how people answered questions over time

Re: Stability of how people answered questions over time

Re: Stability of how people answered questions over time

Re: Stability of how people answered questions over time

Re: Stability of how people answered questions over time

SAS Innovate 2025: Call for Content