BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Steeagle
Calcite | Level 5

Hello,

I am working on difference in difference analysis for longitudinal data. The goal is to investigate the difference on the health care cost by comparing intervention group and control group. We collect response variable at three timepoints: baseline, first year and second year. I know that hypothesis testing for D-I-D can be specify as following if there are only pre and post (baseline and first year) involved. My question is how I should specify hypothesis testing for D-I-D if one more year data was added.  Your help is greatly appreciated.  

 

proc GENMOD data= data_set;
class id  treatment(ref='0') post(ref='0');
model cost =treatment post treatment*post / dist=gamma link=log type3;
repeated subject=id / type=un;
estimate "DID Post-Pre" treatment*post 1 -1 -1 1;
lsmestimate treatment*post "DID Post-Pre" 1 -1 -1 1;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

In this note, see "Difference in Difference Analysis in a Pre/Post Longitudinal Study" in the "Generalized Linear Models with a Non-Identity Link" section. As shown there, you can use the Margins macro to estimate and test a hypothesis on the response means. You haven't stated exactly what you want to test, but assuming it is that the difference in the pre mean minus the average of the post means is the same in the two treatment groups, it would look like the Margins macro call in that section assuming that the POST variable has three levels (pre, 1yr, 2yr) in that order.  Note that the contrast coefficients defining the hypothesis are applied to the margin estimates as displayed, so the ordering is important to proper interpretation.

      data c; 
        length label f $32767; 
        infile datalines delimiter='|';
        input label f; 
        datalines;
      DID pre-avg.post | 1 -.5 -.5   -1 .5 .5
      ;
      %Margins(data       = data_set,
               response   = cost,
               class      = trt post,
               model      = trt|post,
               link       = log,
               dist       = gamma,
               geesubject = id,
               margins    = trt post,
               contrasts  = c,
               options    = cl)

View solution in original post

8 REPLIES 8
StatDave
SAS Super FREQ

In this note, see "Difference in Difference Analysis in a Pre/Post Longitudinal Study" in the "Generalized Linear Models with a Non-Identity Link" section. As shown there, you can use the Margins macro to estimate and test a hypothesis on the response means. You haven't stated exactly what you want to test, but assuming it is that the difference in the pre mean minus the average of the post means is the same in the two treatment groups, it would look like the Margins macro call in that section assuming that the POST variable has three levels (pre, 1yr, 2yr) in that order.  Note that the contrast coefficients defining the hypothesis are applied to the margin estimates as displayed, so the ordering is important to proper interpretation.

      data c; 
        length label f $32767; 
        infile datalines delimiter='|';
        input label f; 
        datalines;
      DID pre-avg.post | 1 -.5 -.5   -1 .5 .5
      ;
      %Margins(data       = data_set,
               response   = cost,
               class      = trt post,
               model      = trt|post,
               link       = log,
               dist       = gamma,
               geesubject = id,
               margins    = trt post,
               contrasts  = c,
               options    = cl)
Steeagle
Calcite | Level 5

Thank you, Dave. This is very helpful. If I want to test the difference in the year 2 and baseline between the treatment and control group, should I specify as: DID pre- Year 2 post | 1 0 -1 -1 0 1

I am not sure how to specify testing by position and can't find an easy to follow tutorial/documentation. Thanks. 

K331
Calcite | Level 5

Hi, I have a study in which the program (RJ) has 4 years of treatment and one pre-treatment year (exposure; coded as 0-4). I want to compare the odds of getting suspended (1/0) for students during the pre-treatment year, and each subsequent year of exposure to the RJ program with the odds of getting suspended for students who didn't receive the treatment (RJ=0).

 

In other words,

Students exposed to treatment compared with their pre-treatment year. 

Students not exposed to treatment compared with their pre-treatment year. 

Students exposed compared with students not exposed. 

 

I'm wondering what the difference would be if I use the following two codes:

 

PROC LOGISTIC data=studentsample;
        class RJ(ref='0') exposure(ref='0')  / param=glm;
        model suspended(event="1") = RJ exposure RJ*exposure;
        estimate "Diff in Diff" RJ*exposure1 -1 -1 1;
        lsmeans  RJ*exposure/ e ilink;
        ods output coef=coeffs;
        lsmestimate RJ*exposure "Diff in Diff LogOdds" 1 -1 -1 1;
        store log;
RUN;
 
versus: 
 
PROC LOGISTIC data=studentsample descending;
       class RJ (ref='0') / param=ref;
       class exposure (ref='0') / param=ref;
model SUSPENDED =
       RJ exposure exposure*RJ
       / clodds=wald ORPVALUE;
       oddsratio RJ / diff=ref;
       oddsratio exposure / diff=ref;
RUN;
 
I have less familiarity in interpreting the difference-in-difference output. My understanding is that my 2nd code -- the proc logistic without the difference-in-difference specification -- still technically gives me a difference-in-difference odds ratio because I get an estimate for RJ*0 RJ*1 RJ*2 RJ*3 RJ*4, and therefore it is not necessary to specify diff-in-diff as I did in the first code. Is this not true? 
 
I assume I'm wrong because when I run the 2nd batch of code, I get this note in the log: 
:Under full-rank parameterizations, Type 3 effect tests are replaced by joint tests. The joint test for an effect is a test that all the parameters associated with that effect are zero. Such joint tests might not be equivalent to Type 3 effect tests under GLM parameterization.
 
Thank you
sbxkoenk
SAS Super FREQ

Hello,

 

The topic thread where you added your new question has already been solved (last year).
Hence ... very few people will read your question (the topic thread participants will get notified though).

 

Can you start a new topic in the "Statistical Procedures"-board (under the "Analytics"-header)??

 

Thanks, Koen

K331
Calcite | Level 5
Yes, will do. Thank you
StatDave
SAS Super FREQ

The description of your study implies that your subjects are observed repeatedly over time. Neither of your analyses takes the resulting correlation among the repeated measures into account. See the last section titled "Treated and Control Groups, Binary Response" in this note. The estimate named "1 month change  diff" produced using the Margins macro, or the estimate named "adjusted exp change" using the NLMeans macro, compare the change from pre to post in the exposed vs. unexposed groups like what you want to do. 

K331
Calcite | Level 5

Thank you very much, this SAS documentation is helpful. I should have mentioned that many of the subjects are not the same from year to year because of new students who enter the school and students who graduate. Given that the measures repeat, but the subjects differ, would I still need to do the interrupted time series analysis? 

StatDave
SAS Super FREQ

If "many of the subjects" means that some subjects do have repeated measures then you still have correlation. Of course, it is up to you if you want to ignore the correlation and assume that all measurements are independent.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1189 views
  • 1 like
  • 4 in conversation