Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: Stability of how people answered questions over time

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-27-2019 04:31 PM
(745 views)

I have a set of data where I'm hoping to get some statistics on how stable people answer some questions over time. For example, here is the percentage of participants who answered Yes/No that they live in public housing.

Year | 2010 | 2012 | 2013 | 2014 | 2015 | |

No | 78% | 87% | 80% | 86% | 84% | 80% |

Yes | 22% | 13% | 20% | 14% | 16% | 20% |

So, it seems to be fairly stable, where similar percentage of people answered Yes across time, but I would like some kind of statistics to show it beyond just showing percentages.

This is what the data looks like. Each year, the sample of participants is different (i.e., the same participants were not followed over time).

Participant ID | Public Housing | Year |

1 | 1 | 2010 |

2 | 1 | 2010 |

3 | 1 | 2010 |

4 | 0 | 2010 |

5 | 1 | 2010 |

6 | 0 | 2012 |

7 | 0 | 2012 |

8 | 0 | 2012 |

9 | 0 | 2012 |

10 | 1 | 2012 |

11 | 1 | 2013 |

12 | 0 | 2013 |

13 | 1 | 2013 |

14 | 1 | 2013 |

15 | 1 | 2013 |

16 | 1 | 2014 |

17 | 0 | 2014 |

18 | 0 | 2014 |

19 | 1 | 2014 |

20 | 0 | 2014 |

21 | 0 | 2015 |

22 | 0 | 2015 |

23 | 1 | 2015 |

24 | 1 | 2015 |

25 | 1 | 2015 |

Anyone have ideas of what statistics I can use? Thank you in advance!

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

# of transitions out of public housing each year

% of transitions out of public housing each year

# of transitions into public housing each year (new)

% of public housing each year

% stayed the same. Those three metrics (which add to 1) should give you a starting point.

% of transitions out of public housing each year

# of transitions into public housing each year (new)

% of public housing each year

% stayed the same. Those three metrics (which add to 1) should give you a starting point.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Two simple tests would be:

```
proc glimmix data=have;
class year;
model housing = year / dist=binary;
run;
proc freq data=have;
table year*housing / chisq;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for the suggestions! I tried both.

Using GLIMMIX:

Over the entire period, the Type III Tests of Fixed Effects show Year is significant, F=6.61, p<.0001.

When I subset the data to the two years that are close in percentage of "Yes" then the results non-significant.

Using chi-square test:

Over the entire period, Chi-square=62.48, p<.0001.

When I subset the data to the two years that are close in percentage of "Yes" then the results non-significant.

Since we're talking about simple models, is SURVEYLOGISTIC appropriate? I ask because it will allow me to use strata and cluster information in the data.

Using SURVEYLOGISTIC:

The Analysis of Maximum Likelihood Estimates shows Year is non-significant, t=-1.14, p=.97

Thank you again!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Looks like you had lots more information that presented in your original question. You don't give enough clues for us to guess why p < 0.0001 would become p = 0.97 when taking strata and clusters into account.

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I apologize for that. Also, the p=.97 was a typo. I mistakenly ran the model on a different variable.

Let me try again. Here are the three models I tried on the same Housing variable (0 or 1) and Year variable (2010-2015). Thanks for your patience.

**proc glimmix** data=temp;

weight Weight;

class Year;

model housing (event="1")= Year / dist=binary;

run;

This results in p<.0001.

**proc surveylogistic** data=temp ;

weight Weight;

model housing (Event='1') = Year;

run;

This results in p=.07

**proc surveylogistic** data=temp ;

strata Region Cycle;

cluster Cluster;

weight Weight;

model housing (Event='1') = Year;

run;

This results in p=.26

It seems like even without strata and cluster the results differ between GLIMMIX and SURVEYLOGISTIC.

I found similar results with the other variables I'm looking at. That is, in most cases GLIMMIX would have significant p-value and SURVEYLOGISTIC (without strata or cluster) would have non-significant p-value.

So, I'm trying to understand which one is more appropriate, GLIMMIX or SURVEYLOGISTIC.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The main difference that I can spot is that you didn't specify YEAR as a class variable in surveylogistic. That changes the model entirely.

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you! Here's what the correctly specified SURVEYLOGISTIC model shows (without strata and cluster so I can compare to the GLIMMIX model). It's similar to the GLIMMIX model (i.e., both significant).

**proc surveylogistic** data=temp;

weight Weight;

class Year;

model Housing (Event='1') = Year;

run;

Type 3 Analysis of Effects:

Year is significant, p=.02

Analysis of Maximum Likelihood Estimates

Year 2010: p=.18

2011: p=.29

2012: p=.57

2013: p=.96

2014: p=.72

2015: p=.40

I will need to put in strata and cluster for the final model as it is more accurate for my data. The results are:

Type 3 Analysis of Effects:

Year is non-significant, p=.47

Analysis of Maximum Likelihood Estimates

Year 2010: p=.31

2011: p=.61

2012: p=.57

2013: p=.55

2014: p=.82

2015: p=.32

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.