Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
SAS49
Obsidian | Level 7

Hi,  I am trying to use PROC GENMOD to fit a multinomial logistic model accounting for repeated measures using GEE.   I have a categorical exposure with 3 categories and an ordinal outcome with 3 levels (low, medium, high).  However,  these exposures and outcomes result from pairwise comparisons between all individuals in my dataset so my unit of analysis is a pair, not an individual.  The data table structure is as follows: 

 

pairsample1sample2exposureoutcome
1ABcategory 1low
2ACcategory 1medium
3ADcategory 2high
4BCcategory 3low
5BDcategory 3low
6CDcategory 2medium

 

 

In this simplified example, there are a total of 4 samples (A,B,C and D) so 6 pairs when comparing each sample to all others.  I want to account for the fact that each of these samples occurs in multiple pairs, however, I cannot figure out how to deal with the fact I have 2 samples per pair that I need to account for.   I have used the following code to incorporate just sample1 as repeated measure, but is there a way to incorporate both sample1 and sample2 as the subject of the repeated measure?  I receive various errors whenever I try to do so. 

 

 

proc genmod data=my_data;

 class exposure sample1;

 model outcome= exposure / link=cumlogit;

 repeated subject = sample1; 

 run;

Is there a way to accomplish this task?  

5 REPLIES 5
StatDave
SAS Super FREQ
The purpose of the SUBJECT= specification is to distinguish the observations that are considered correlated from those considered uncorrelated. The correlated observations should have the same value. It sounds like you consider all 6 of those observations to be correlated. If so, then you would need to create a variable with the same value for those 6 observations. Hopefully you have many sets of 4 samples yielding data on multiple sets of 6 observations. Keep in mind that validity of the GEE method requires a large number of subjects/clusters. Each set would have a unique value on the new variable in the data. You would then specify the new variable in SUBJECT=.
SAS49
Obsidian | Level 7
Hi, so my original question wasn't very clear. I have a total of 100 samples. Each of those is compared to all of the others and that is the complete dataset. So the first 99 rows of my dataset are sample #1 in the sample1 column being compared to all 99 other samples in the sample2 column. The next 99 rows are then sample #2 in the sample1 column the being compared to the 98 other samples in the sample2 column it hadn't been compared to yet. So I need to account for the clusters of sample 1, sample 2 etc... and so on in each sample pair, but the data on those clusters occurs in 2 columns. I know I can account for the first sample of each pair by setting sample1 column as the SUBJECT=, but the issue I am having is then accounting for the clusters of samples in the sample2 column as well. Let me know if that doesn't make sense.
SteveDenham
Jade | Level 19

Your design implies that you have 4950 pairs that are correlated (100 take 2).  How many total observations do you have?  If it isn't at least 49,500, GEE might not be the best tool.  In fact, you may need to analyze your data in some other fashion designed for multinomial responses (FREQ, LOGISTIC, GENMOD, CATMOD) that would use the levels of pair and the levels of exposure (and their interaction, if possible) as factors.

 

SteveDenham

SAS49
Obsidian | Level 7
Yes. My total dataset has 4,950 observations or pairs of samples. I have been trying to use PROC GENMOD with a cumulative logit link and repeated statement. When I run it like this I obtain 4950 clusters in the model, but I am not sure if this is correct or how to interpret the output really.
proc genmod data=recode descending;
class sample1 sample2 outcome(ref = '1');
model outcome=exposure/ link=clogit;
repeated subject = sample1(sample2);
run;

Note: I coded my outcome as 1 (low), 2 (medium), 3 (high)
SteveDenham
Jade | Level 19

It appears to me that there is no repeated effect in your design - there is one outcome for each of the 4950 pairs.  That leads to another question - how is exposure measured?  The only estimation available is the marginal effect of exposure on outcome, averaged over all the pairs (I think).  What do you get if you remove the REPEATED statement?

 

SteveDenham

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1438 views
  • 0 likes
  • 3 in conversation