Hi everyone, I have looked through the forums and cannot seem to find an answer to my question.
I have complex, weighted data (roughly 4000 observations) over the course of 3 waves in multiple age bands: 15-18, 19-21, 22-25. I want to show the annual percentage change between each wave of the categorical variable "heavy drinker" (binary yes/no). I would then like to test if the annual percent change is significant between the age groups as well as across multiple waves within each age group (wave 1 to 2, 2 to 3, 1 to 5).
How would I create the annual percentage change?
What would be the appropriate statistical test for something like this?
example data:
| id | wave | age group | heavy drinker |
| 1 | 1 | 15-18 | yes |
| 1 | 2 | 15-18 | no |
| 1 | 3 | 19-21 | no |
| 2 | 1 | 22-25 | no |
| 2 | 2 | 22-25 | yes |
| 2 | 3 | 22-25 | yes |
| 3 | 1 | 15-18 | no |
| 3 | 2 | 19-21 | yes |
| 3 | 3 | 19-21 | no |
First, I would make mosaic plots for visualization:
Then compare proportions.
For dependent samples : 22562 - Testing the equality of proportions from dependent samples
For independent samples :
You can also take a modelling approach, but first you can try with statistical hypothesis testing.
Ciao,
Koen
Thank you for the response. Is there an option to incorporate the weight aspect like in proc surveyfreq to create these?
Please explain further. How can we calculate annual change if there is no variable in your data set that indicates year?
Please explain further. You have weighted data, what are the weights, and which variable(s) are weighted, and how/why would we use the weights?
The wave represents "year". The data were collected at different lengths of time. For example, Wave 1 took place between 2010 to 2012, Wave 2 2015 to 2018 and Wave 3 2020 to 2021. The same people were surveyed at each time point. I have been working with weights calculated as "all waves weights variable: weights_waves1to3" that is to be used for used when working with all 3 waves. So the calculation would be more of a wave percent change rather than annual. For other calculations I have relied on the proc surveyfreq incorporating the weight variable. Expansion of the data would be assigning the weight variable to each id and carrying it through the waves. Please let me know if you need any more details.
example data:
| id | wave | age group | heavy drinker | weights_waves1to3 |
| 1 | 1 | 15-18 | yes | 1.2 |
| 1 | 2 | 15-18 | no | 1.2 |
| 1 | 3 | 19-21 | no | 1.2 |
| 2 | 1 | 22-25 | no | .8 |
| 2 | 2 | 22-25 | yes | .8 |
| 2 | 3 | 22-25 | yes | .8 |
| 3 | 1 | 15-18 | no | 1.6 |
| 3 | 2 | 19-21 | yes | 1.6 |
| 3 | 3 | 19-21 | no | 1.6 |
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.