Solved: Difference in Difference analysis for longitudinal data

Steeagle · Posted 12-05-2023 02:53 PM

Hello,

I am working on difference in difference analysis for longitudinal data. The goal is to investigate the difference on the health care cost by comparing intervention group and control group. We collect response variable at three timepoints: baseline, first year and second year. I know that hypothesis testing for D-I-D can be specify as following if there are only pre and post (baseline and first year) involved. My question is how I should specify hypothesis testing for D-I-D if one more year data was added. Your help is greatly appreciated.

proc GENMOD data= data_set;
class id treatment(ref='0') post(ref='0');
model cost =treatment post treatment*post / dist=gamma link=log type3;
repeated subject=id / type=un;
estimate "DID Post-Pre" treatment*post 1 -1 -1 1;
lsmestimate treatment*post "DID Post-Pre" 1 -1 -1 1;
run;

StatDave · Posted 12-05-2023 11:15 PM

In this note, see "Difference in Difference Analysis in a Pre/Post Longitudinal Study" in the "Generalized Linear Models with a Non-Identity Link" section. As shown there, you can use the Margins macro to estimate and test a hypothesis on the response means. You haven't stated exactly what you want to test, but assuming it is that the difference in the pre mean minus the average of the post means is the same in the two treatment groups, it would look like the Margins macro call in that section assuming that the POST variable has three levels (pre, 1yr, 2yr) in that order. Note that the contrast coefficients defining the hypothesis are applied to the margin estimates as displayed, so the ordering is important to proper interpretation.

      data c; 
        length label f $32767; 
        infile datalines delimiter='|';
        input label f; 
        datalines;
      DID pre-avg.post | 1 -.5 -.5   -1 .5 .5
      ;
      %Margins(data       = data_set,
               response   = cost,
               class      = trt post,
               model      = trt|post,
               link       = log,
               dist       = gamma,
               geesubject = id,
               margins    = trt post,
               contrasts  = c,
               options    = cl)

View solution in original post

StatDave · Posted 12-05-2023 11:15 PM

In this note, see "Difference in Difference Analysis in a Pre/Post Longitudinal Study" in the "Generalized Linear Models with a Non-Identity Link" section. As shown there, you can use the Margins macro to estimate and test a hypothesis on the response means. You haven't stated exactly what you want to test, but assuming it is that the difference in the pre mean minus the average of the post means is the same in the two treatment groups, it would look like the Margins macro call in that section assuming that the POST variable has three levels (pre, 1yr, 2yr) in that order. Note that the contrast coefficients defining the hypothesis are applied to the margin estimates as displayed, so the ordering is important to proper interpretation.

      data c; 
        length label f $32767; 
        infile datalines delimiter='|';
        input label f; 
        datalines;
      DID pre-avg.post | 1 -.5 -.5   -1 .5 .5
      ;
      %Margins(data       = data_set,
               response   = cost,
               class      = trt post,
               model      = trt|post,
               link       = log,
               dist       = gamma,
               geesubject = id,
               margins    = trt post,
               contrasts  = c,
               options    = cl)

Steeagle · Posted 12-07-2023 10:29 AM

Thank you, Dave. This is very helpful. If I want to test the difference in the year 2 and baseline between the treatment and control group, should I specify as: DID pre- Year 2 post | 1 0 -1 -1 0 1

I am not sure how to specify testing by position and can't find an easy to follow tutorial/documentation. Thanks.

K331 · Posted 01-15-2024 05:52 PM

Hi, I have a study in which the program (RJ) has 4 years of treatment and one pre-treatment year (exposure; coded as 0-4). I want to compare the odds of getting suspended (1/0) for students during the pre-treatment year, and each subsequent year of exposure to the RJ program with the odds of getting suspended for students who didn't receive the treatment (RJ=0).

In other words,

Students exposed to treatment compared with their pre-treatment year.

Students not exposed to treatment compared with their pre-treatment year.

Students exposed compared with students not exposed.

I'm wondering what the difference would be if I use the following two codes:

PROC LOGISTIC data=studentsample;

class RJ(ref='0') exposure(ref='0') / param=glm;

model suspended(event="1") = RJ exposure RJ*exposure;

estimate "Diff in Diff" RJ*exposure1 -1 -1 1;

lsmeans RJ*exposure/ e ilink;

ods output coef=coeffs;

lsmestimate RJ*exposure "Diff in Diff LogOdds" 1 -1 -1 1;

store log;

RUN;

versus:

PROC LOGISTIC data=studentsample descending;
class RJ (ref='0') / param=ref;
class exposure (ref='0') / param=ref;
model SUSPENDED =
RJ exposure exposure*RJ
/ clodds=wald ORPVALUE;
oddsratio RJ / diff=ref;
oddsratio exposure / diff=ref;
RUN;

I have less familiarity in interpreting the difference-in-difference output. My understanding is that my 2nd code -- the proc logistic without the difference-in-difference specification -- still technically gives me a difference-in-difference odds ratio because I get an estimate for RJ*0 RJ*1 RJ*2 RJ*3 RJ*4, and therefore it is not necessary to specify diff-in-diff as I did in the first code. Is this not true?

I assume I'm wrong because when I run the 2nd batch of code, I get this note in the log:

:	Under full-rank parameterizations, Type 3 effect tests are replaced by joint tests. The joint test for an effect is a test that all the parameters associated with that effect are zero. Such joint tests might not be equivalent to Type 3 effect tests under GLM parameterization.

Thank you

sbxkoenk · Posted 01-16-2024 06:16 AM

Hello,

The topic thread where you added your new question has already been solved (last year).
Hence ... very few people will read your question (the topic thread participants will get notified though).

Can you start a new topic in the "Statistical Procedures"-board (under the "Analytics"-header)??

Thanks, Koen

K331 · Posted 01-16-2024 10:27 AM

Yes, will do. Thank you

StatDave · Posted 01-16-2024 11:54 AM

The description of your study implies that your subjects are observed repeatedly over time. Neither of your analyses takes the resulting correlation among the repeated measures into account. See the last section titled "Treated and Control Groups, Binary Response" in this note. The estimate named "1 month change diff" produced using the Margins macro, or the estimate named "adjusted exp change" using the NLMeans macro, compare the change from pre to post in the exposed vs. unexposed groups like what you want to do.

K331 · Posted 01-16-2024 12:46 PM

Thank you very much, this SAS documentation is helpful. I should have mentioned that many of the subjects are not the same from year to year because of new students who enter the school and students who graduate. Given that the measures repeat, but the subjects differ, would I still need to do the interrupted time series analysis?

StatDave · Posted 01-16-2024 04:48 PM

If "many of the subjects" means that some subjects do have repeated measures then you still have correlation. Of course, it is up to you if you want to ignore the correlation and assume that all measurements are independent.

Difference in Difference analysis for longitudinal data

Re: Difference in Difference analysis for longitudinal data

Re: Difference in Difference analysis for longitudinal data

Re: Difference in Difference analysis for longitudinal data

Re: Difference in Difference analysis for longitudinal data

Re: Difference in Difference analysis for longitudinal data

Re: Difference in Difference analysis for longitudinal data

Re: Difference in Difference analysis for longitudinal data

Re: Difference in Difference analysis for longitudinal data

Re: Difference in Difference analysis for longitudinal data