Re: Inquire about subgroup analysis in GEE model

vivian_h · Posted 04-11-2024 05:17 AM

Hiii,

I have questions about doing subgroup analysis in the gee model, I have a binary variable "p4p" (0,1) and I need the results of 2 groups respectively. The following code is my model

%macro testn(x);
proc genmod data=all;
class matchid adrs(ref='0') time(ref='2') age id_s dcsi ; 
model &x. = adrs time adrs*time ;
repeated subject = matchid/type = exch;
estimate "Diff in Diff year 1" adrs*time -1	0	0	1	0	0	1	0	0   -1	 0  0; 
estimate "Diff in Diff year 2" adrs*time -1	0	0	0	1	0	1	0	0    0  -1  0; 
estimate "Diff in Diff year 3" adrs*time -1	0	0	0	0	1	1	0	0    0   0 -1; 
lsmeans time*adrs;
run;
%mend;
%testn(return);
%testn(acsc);
%testn(score);

My previous way to solve this problem is to separate the dataset first, and then run these 2 datasets (p4p/p4p_) separately:

data p4p; set all; if p4p=1; run; 
data p4p_; set all; if p4p=0; run;

but I found that the estimate value will be a little bit weird if I run the model separately, so I am thinking about if I should put the subgroups in the same model.

I have tried to put the variable"p4p" after the command "class", but the result is not what I have expected.

Anyone knows how to do the subgroup analysis in the model???

Need SAS experts please!! I really need help!!

THANK YOU SO MUCH!!

StatDave · Posted 04-11-2024 12:02 PM

You appear to want to estimate the difference in difference of the event probability among your groups. See this note that discusses this analysis in detail. To estimate the difference in difference on the probability scale, you can use the Margins macro as shown in the "Generalized Linear Models with a Non-Identity Link" section of the note. Other macros can also be used as shown there.

Note that any time you fit a model to a binary response model, you should specify which level of the binary response variable is considered the level of interest (the "event" level). That is shown in the Margins macro call. Also, to avoid the unnecessary omission of observations due to missing values, you should never specify CLASS variables that aren't use elsewhere in the model specification.

vivian_h · Posted 04-11-2024 12:25 PM

are there any easier way

what if i do this?

%macro testn(x);
proc genmod data=all;
class matchid adrs(ref='0') time(ref='2') age id_s dcsi p4p; 
where p4p=1; /*is it feasible to do it like this*/
model &x. = adrs time adrs*time ;
repeated subject = matchid/type = exch;
estimate "Diff in Diff year 1" adrs*time -1	0	0	1	0	0	1	0	0   -1	 0  0; 
estimate "Diff in Diff year 2" adrs*time -1	0	0	0	1	0	1	0	0    0  -1  0; 
estimate "Diff in Diff year 3" adrs*time -1	0	0	0	0	1	1	0	0    0   0 -1; 
lsmeans time*adrs;
run;
%mend;
%testn(return);

StatDave · Posted 04-11-2024 12:43 PM

If you just want a point estimate of the difference in difference of event probabilities then you can hand compute it from the results of your LSMEANS statement, but you will need to add the ILINK option in that statement. That will add the Mean column in the LSMEANS table which contains the estimated event probabilities for your combinations of TIME and ADRS. Remove the WHERE statement. The ESTIMATE statements you have, if the coefficients are correct, only tell you the difference in difference of the log odds, not of the event probabilities. You could use an LSMESTIMATE statement to do the same more simply. Examples of LSMESTIMATE are in the note.