BookmarkSubscribeRSS Feed
vivian_h
Calcite | Level 5

 Hiii,

I have questions about doing subgroup analysis in the gee model, I have a binary variable "p4p" (0,1) and I need the results of 2 groups respectively. The following code is my model

%macro testn(x);
proc genmod data=all;
class matchid adrs(ref='0') time(ref='2') age id_s dcsi ; 
model &x. = adrs time adrs*time ;
repeated subject = matchid/type = exch;
estimate "Diff in Diff year 1" adrs*time -1	0	0	1	0	0	1	0	0   -1	 0  0; 
estimate "Diff in Diff year 2" adrs*time -1	0	0	0	1	0	1	0	0    0  -1  0; 
estimate "Diff in Diff year 3" adrs*time -1	0	0	0	0	1	1	0	0    0   0 -1; 
lsmeans time*adrs;
run;
%mend;
%testn(return);
%testn(acsc);
%testn(score);

My previous way to solve this problem is to separate the dataset first, and then run these 2 datasets (p4p/p4p_) separately: 

data p4p; set all; if p4p=1; run; 
data p4p_; set all; if p4p=0; run; 

 but I found that the estimate value will be a little bit weird if I run the model separately, so I am thinking about if I should put the subgroups in the same model.

I have tried to put the variable"p4p"  after the command "class", but the result is not what I have expected.

 

Anyone knows how to do the subgroup analysis in the model???

 

Need SAS experts please!! I really need help!!

THANK YOU SO MUCH!!

3 REPLIES 3
StatDave
SAS Super FREQ

You appear to want to estimate the difference in difference of the event probability among your groups. See this note that discusses this analysis in detail. To estimate the difference in difference on the probability scale, you can use the Margins macro as shown in the "Generalized Linear Models with a Non-Identity Link" section of the note. Other macros can also be used as shown there.

 

Note that any time you fit a model to a binary response model, you should specify which level of the binary response variable is considered the level of interest (the "event" level). That is shown in the Margins macro call. Also, to avoid the unnecessary omission of observations due to missing values, you should never specify CLASS variables that aren't use elsewhere in the model specification.

vivian_h
Calcite | Level 5

are there any easier way

 

what if i do this?

 

%macro testn(x);
proc genmod data=all;
class matchid adrs(ref='0') time(ref='2') age id_s dcsi p4p; 
where p4p=1; /*is it feasible to do it like this*/ model &x. = adrs time adrs*time ; repeated subject = matchid/type = exch; estimate "Diff in Diff year 1" adrs*time -1 0 0 1 0 0 1 0 0 -1 0 0; estimate "Diff in Diff year 2" adrs*time -1 0 0 0 1 0 1 0 0 0 -1 0; estimate "Diff in Diff year 3" adrs*time -1 0 0 0 0 1 1 0 0 0 0 -1; lsmeans time*adrs; run; %mend; %testn(return);

 

StatDave
SAS Super FREQ
If you just want a point estimate of the difference in difference of event probabilities then you can hand compute it from the results of your LSMEANS statement, but you will need to add the ILINK option in that statement. That will add the Mean column in the LSMEANS table which contains the estimated event probabilities for your combinations of TIME and ADRS. Remove the WHERE statement. The ESTIMATE statements you have, if the coefficients are correct, only tell you the difference in difference of the log odds, not of the event probabilities. You could use an LSMESTIMATE statement to do the same more simply. Examples of LSMESTIMATE are in the note.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 628 views
  • 1 like
  • 2 in conversation