Hello,
I am working on difference in difference analysis for longitudinal data. The goal is to investigate the difference on the health care cost by comparing intervention group and control group. We collect response variable at three timepoints: baseline, first year and second year. I know that hypothesis testing for D-I-D can be specify as following if there are only pre and post (baseline and first year) involved. My question is how I should specify hypothesis testing for D-I-D if one more year data was added. Your help is greatly appreciated.
proc GENMOD data= data_set;
class id treatment(ref='0') post(ref='0');
model cost =treatment post treatment*post / dist=gamma link=log type3;
repeated subject=id / type=un;
estimate "DID Post-Pre" treatment*post 1 -1 -1 1;
lsmestimate treatment*post "DID Post-Pre" 1 -1 -1 1;
run;
In this note, see "Difference in Difference Analysis in a Pre/Post Longitudinal Study" in the "Generalized Linear Models with a Non-Identity Link" section. As shown there, you can use the Margins macro to estimate and test a hypothesis on the response means. You haven't stated exactly what you want to test, but assuming it is that the difference in the pre mean minus the average of the post means is the same in the two treatment groups, it would look like the Margins macro call in that section assuming that the POST variable has three levels (pre, 1yr, 2yr) in that order. Note that the contrast coefficients defining the hypothesis are applied to the margin estimates as displayed, so the ordering is important to proper interpretation.
data c;
length label f $32767;
infile datalines delimiter='|';
input label f;
datalines;
DID pre-avg.post | 1 -.5 -.5 -1 .5 .5
;
%Margins(data = data_set,
response = cost,
class = trt post,
model = trt|post,
link = log,
dist = gamma,
geesubject = id,
margins = trt post,
contrasts = c,
options = cl)
In this note, see "Difference in Difference Analysis in a Pre/Post Longitudinal Study" in the "Generalized Linear Models with a Non-Identity Link" section. As shown there, you can use the Margins macro to estimate and test a hypothesis on the response means. You haven't stated exactly what you want to test, but assuming it is that the difference in the pre mean minus the average of the post means is the same in the two treatment groups, it would look like the Margins macro call in that section assuming that the POST variable has three levels (pre, 1yr, 2yr) in that order. Note that the contrast coefficients defining the hypothesis are applied to the margin estimates as displayed, so the ordering is important to proper interpretation.
data c;
length label f $32767;
infile datalines delimiter='|';
input label f;
datalines;
DID pre-avg.post | 1 -.5 -.5 -1 .5 .5
;
%Margins(data = data_set,
response = cost,
class = trt post,
model = trt|post,
link = log,
dist = gamma,
geesubject = id,
margins = trt post,
contrasts = c,
options = cl)
Thank you, Dave. This is very helpful. If I want to test the difference in the year 2 and baseline between the treatment and control group, should I specify as: DID pre- Year 2 post | 1 0 -1 -1 0 1
I am not sure how to specify testing by position and can't find an easy to follow tutorial/documentation. Thanks.
Hi, I have a study in which the program (RJ) has 4 years of treatment and one pre-treatment year (exposure; coded as 0-4). I want to compare the odds of getting suspended (1/0) for students during the pre-treatment year, and each subsequent year of exposure to the RJ program with the odds of getting suspended for students who didn't receive the treatment (RJ=0).
In other words,
Students exposed to treatment compared with their pre-treatment year.
Students not exposed to treatment compared with their pre-treatment year.
Students exposed compared with students not exposed.
I'm wondering what the difference would be if I use the following two codes:
: | Under full-rank parameterizations, Type 3 effect tests are replaced by joint tests. The joint test for an effect is a test that all the parameters associated with that effect are zero. Such joint tests might not be equivalent to Type 3 effect tests under GLM parameterization. |
Hello,
The topic thread where you added your new question has already been solved (last year).
Hence ... very few people will read your question (the topic thread participants will get notified though).
Can you start a new topic in the "Statistical Procedures"-board (under the "Analytics"-header)??
Thanks, Koen
The description of your study implies that your subjects are observed repeatedly over time. Neither of your analyses takes the resulting correlation among the repeated measures into account. See the last section titled "Treated and Control Groups, Binary Response" in this note. The estimate named "1 month change diff" produced using the Margins macro, or the estimate named "adjusted exp change" using the NLMeans macro, compare the change from pre to post in the exposed vs. unexposed groups like what you want to do.
Thank you very much, this SAS documentation is helpful. I should have mentioned that many of the subjects are not the same from year to year because of new students who enter the school and students who graduate. Given that the measures repeat, but the subjects differ, would I still need to do the interrupted time series analysis?
If "many of the subjects" means that some subjects do have repeated measures then you still have correlation. Of course, it is up to you if you want to ignore the correlation and assume that all measurements are independent.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.