Hi all,
I am trying to look at the association between individuals having the same occupation and their distance from each other in a dataset. A sample of this data set is below:
Job: 0 = farmer, 1 = fisher, 2 = chef
Distance: 0 = low, 1 = medium, 2 = high
Pair | ID1 | ID2 | Job | Distance |
1 | A | B | 0 | 0 |
2 | A | C | 2 | 1 |
3 | A | D | 1 | 2 |
4 | B | C | 2 | 1 |
5 | B | D | 1 | 1 |
6 | C | D | 2 | 2 |
However, because individuals occur in more than one sample pair, I am worried the samples will be correlated in certain ways. I want to use PROC MIXED to model the association between Job and Distance taking into account the IDs found in each pair (ie sample A appears in Pair 1, 2, and 3, while sample B occurs in Pair 1, 4, and 5). I am not sure how to proceed. Any help would be appreciated. Thanks!
Here are some things to consider:
It appears that your response variable (Distance? Job?) is multinomial, as is the predictor. That means that PROC MIXED is probably the wrong method for any analysis, as it assumes that the distribution of errors is normal. You may want to look at other mixed model procedures (GLIMMIX), or generalized linear model/estimating equation procedures (GENMOD, GEE).
What is the role of ID1 and ID2? Is there ever a case where the values for these two variables are identical? Are there more than 4 levels? Would you consider these as predictors? If so, the association between the two could be measured by including an interaction term in the model. If not predictors, would you consider them random effects, such as blocks? Since ID levels B and C appear in both ID variables, this may lead to an inability to estimate the random effects unless you have a lot of levels and observations per level. If there is only a small number of levels, you might be better off considering them fixed effects, in which case you might not need a mixed model at all.
SteveDenham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.