BookmarkSubscribeRSS Feed
SAS49
Obsidian | Level 7

Hi all,

I am trying to look at the association between individuals having the same occupation and their distance from each other in a dataset.  A sample of this data set is below:

Job: 0 = farmer, 1 = fisher, 2 = chef

Distance: 0 = low, 1 = medium, 2 = high

PairID1ID2JobDistance
1AB00
2AC21
3AD12
4BC21
5BD11
6CD22

However, because individuals occur in more than one sample pair, I am worried the samples will be correlated in certain ways. I want to use PROC MIXED to model the association between Job and Distance taking into account the IDs found in each pair (ie sample A appears in Pair 1, 2, and 3, while sample B occurs in Pair 1, 4, and 5).  I am not sure how to proceed.  Any help would be appreciated.  Thanks!

2 REPLIES 2
SteveDenham
Jade | Level 19

Here are some things to consider:

 

It appears that your response variable (Distance? Job?) is multinomial, as is the predictor.  That means that PROC MIXED is probably the wrong method for any analysis, as it assumes that the distribution of errors is normal.  You may want to look at other mixed model procedures (GLIMMIX), or generalized linear model/estimating equation procedures (GENMOD, GEE).

 

What is the role of ID1 and ID2?  Is there ever a case where the values for these two variables are identical? Are there more than 4 levels? Would you consider these as predictors?  If so, the association between the two could be measured by including an interaction term in the model.  If not predictors, would you consider them random effects, such as blocks?  Since ID levels B and C appear in both ID variables, this may lead to an inability to estimate the random effects unless you have a lot of levels and observations per level.  If there is only a small number of levels, you might be better off considering them fixed effects, in which case you might not need a mixed model at all.

 

SteveDenham

SAS49
Obsidian | Level 7
Hi, thanks for the response. There are 3 levels in both my exposure (Job) and response variable (Distance). As for the role of ID1 and ID2. They are each individual, and I have a total of 100 individuals or values that appear in ID1 and ID2. Each individual is then compared with each individual besides itself, so I have 5,050 total rows each with a number 1-5,050 in the Pair column. So the IDs within ID1 and ID2 are never identical on a given row, but most variables occur in both columns and the different IDs occur multiple times within a column. Is there an appropriate way to account for the fact that individuals are included in multiple pairs using a PROC Method?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 483 views
  • 0 likes
  • 2 in conversation