Re: PROC GENMOD accounting for 2 repeated measures using GEE

SAS49 · Posted 06-14-2022 05:32 PM

Hi, I am trying to use PROC GENMOD to fit a multinomial logistic model accounting for repeated measures using GEE. I have a categorical exposure with 3 categories and an ordinal outcome with 3 levels (low, medium, high). However, these exposures and outcomes result from pairwise comparisons between all individuals in my dataset so my unit of analysis is a pair, not an individual. The data table structure is as follows:

pair	sample1	sample2	exposure	outcome
1	A	B	category 1	low
2	A	C	category 1	medium
3	A	D	category 2	high
4	B	C	category 3	low
5	B	D	category 3	low
6	C	D	category 2	medium

In this simplified example, there are a total of 4 samples (A,B,C and D) so 6 pairs when comparing each sample to all others. I want to account for the fact that each of these samples occurs in multiple pairs, however, I cannot figure out how to deal with the fact I have 2 samples per pair that I need to account for. I have used the following code to incorporate just sample1 as repeated measure, but is there a way to incorporate both sample1 and sample2 as the subject of the repeated measure? I receive various errors whenever I try to do so.

proc genmod data=my_data;

 class exposure sample1;

 model outcome= exposure / link=cumlogit;

 repeated subject = sample1; 

 run;

Is there a way to accomplish this task?

StatDave · Posted 06-14-2022 06:10 PM

The purpose of the SUBJECT= specification is to distinguish the observations that are considered correlated from those considered uncorrelated. The correlated observations should have the same value. It sounds like you consider all 6 of those observations to be correlated. If so, then you would need to create a variable with the same value for those 6 observations. Hopefully you have many sets of 4 samples yielding data on multiple sets of 6 observations. Keep in mind that validity of the GEE method requires a large number of subjects/clusters. Each set would have a unique value on the new variable in the data. You would then specify the new variable in SUBJECT=.

SAS49 · Posted 06-15-2022 11:27 AM

Hi, so my original question wasn't very clear. I have a total of 100 samples. Each of those is compared to all of the others and that is the complete dataset. So the first 99 rows of my dataset are sample #1 in the sample1 column being compared to all 99 other samples in the sample2 column. The next 99 rows are then sample #2 in the sample1 column the being compared to the 98 other samples in the sample2 column it hadn't been compared to yet. So I need to account for the clusters of sample 1, sample 2 etc... and so on in each sample pair, but the data on those clusters occurs in 2 columns. I know I can account for the first sample of each pair by setting sample1 column as the SUBJECT=, but the issue I am having is then accounting for the clusters of samples in the sample2 column as well. Let me know if that doesn't make sense.

SteveDenham · Posted 06-16-2022 01:39 PM

Your design implies that you have 4950 pairs that are correlated (100 take 2). How many total observations do you have? If it isn't at least 49,500, GEE might not be the best tool. In fact, you may need to analyze your data in some other fashion designed for multinomial responses (FREQ, LOGISTIC, GENMOD, CATMOD) that would use the levels of pair and the levels of exposure (and their interaction, if possible) as factors.

SteveDenham

SAS49 · Posted 06-16-2022 02:36 PM

Yes. My total dataset has 4,950 observations or pairs of samples. I have been trying to use PROC GENMOD with a cumulative logit link and repeated statement. When I run it like this I obtain 4950 clusters in the model, but I am not sure if this is correct or how to interpret the output really.
proc genmod data=recode descending;
class sample1 sample2 outcome(ref = '1');
model outcome=exposure/ link=clogit;
repeated subject = sample1(sample2);
run;

Note: I coded my outcome as 1 (low), 2 (medium), 3 (high)

SteveDenham · Posted 06-21-2022 08:16 AM

It appears to me that there is no repeated effect in your design - there is one outcome for each of the 4950 pairs. That leads to another question - how is exposure measured? The only estimation available is the marginal effect of exposure on outcome, averaged over all the pairs (I think). What do you get if you remove the REPEATED statement?

SteveDenham

PROC GENMOD accounting for 2 repeated measures using GEE