BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ha33
Obsidian | Level 7

Hi everyone

 

I want to perform an interrupted time series -like analysis where we assess the effect of an intervention on experiencing an event. It is observational data that looks something like this: 

 

ID MONTHSINTERVENTIONOUTCOME
1100
1200
1310
1411
2100
2200
2311

 

If the outcome were continuos I would have used a mixed model beautifully described in an article form SAS global forum 2020:

PROC MIXED DATA = SINGLE_ITS_DATA METHOD=ML
PLOTS(MAXPOINTS=20000)=(RESIDUALPANEL(UNPACK) VCIRYPANEL(UNPACK) );
CLASS ID;
MODEL OUTCOME=MONTH INTERVENTION MONTH*INTERVENTION/S;
REPEATED / SUBJECT = ID TYPE = UN;
ESTIMATE 'PRE-PERIOD SLOPE' MONTH 1 / CL E;
ESTIMATE 'POST-PERIOD SLOPE' MONTH 1 MONTH*INTERVENTION 1
/ CL E;
ESTIMATE 'PRE/POST GAP' MONTH 1 INTERVENTION 1
MONTH*INTERVENTION 10 /CL E;

I want to perform the same analysis but since my outcome is binary I was thinking of proc genmod or proc glimmix, and estimating the odds ratio of experiencing the effect in the period before, after and whether the difference between before and after is significant. 

 

Anyone who could suggest a syntax for this that comes close to what the mixed model delivers above? Doesn't matter which procedure..

 

Best regards,
Kristian

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

This is an extension of the difference in difference analysis for longitudinal binary data discussed in this note. With the MONTH variable as a continuous predictor, one way to estimate the slopes is to compute average marginal effects within the pre- and post-intervention periods. This can be done with the Margins macro, which is also illustrated in the above note for difference in difference analysis. These statements fit a GEE model allowing for repeated binary measures on each ID and plot the fitted values on the mean (probability) scale and on the link (log odds) scale.

     proc gee data=mydata; 
        class id iv;
        model y(event="1") = iv|month / dist=bin;
        repeated subject=id;
        effectplot slicefit(x=month sliceby=iv);
        effectplot slicefit(x=month sliceby=iv)/link;
        run;

The Margins macro can also fit the model and request estimates of the mean (event probability) at each month.

%margins(data=mydata, class=id iv, response=y, roptions=event="1",
     model=iv|month, dist=binomial, 
     geesubject=id, margins=month iv)

You can also use the Margins macro to compute the slope in each period as the average marginal effect over the months in the period. 

%margins(data=mydata, class=id iv, response=y, roptions=event="1",
     model=iv|month, dist=binomial, 
     geesubject=id, effect=month, within=iv=0)
%margins(data=mydata, class=id iv, response=y, roptions=event="1",
     model=iv|month, dist=binomial, 
     geesubject=id, effect=month, within=iv=1)

If "PRE/POST GAP" means that you want to estimate the change over the two months immediately on either side of the intervention, then this can be done by estimating a contrast on the monthly margins. This code assumes there are 6 months with intervention between months 3 and 4.

     data c; 
        length label f $32767; 
        infile datalines delimiter='|';
        input label f; 
        datalines;
      gap: month4-month3 | 0 0 -1 1 0 0
      ;
%margins(data=mydata, class=id iv, response=y, roptions=event="1",
     model=iv|month, dist=binomial, 
     geesubject=id, 
     margins=iv month, contrasts=c)

View solution in original post

11 REPLIES 11
StatDave
SAS Super FREQ

This is an extension of the difference in difference analysis for longitudinal binary data discussed in this note. With the MONTH variable as a continuous predictor, one way to estimate the slopes is to compute average marginal effects within the pre- and post-intervention periods. This can be done with the Margins macro, which is also illustrated in the above note for difference in difference analysis. These statements fit a GEE model allowing for repeated binary measures on each ID and plot the fitted values on the mean (probability) scale and on the link (log odds) scale.

     proc gee data=mydata; 
        class id iv;
        model y(event="1") = iv|month / dist=bin;
        repeated subject=id;
        effectplot slicefit(x=month sliceby=iv);
        effectplot slicefit(x=month sliceby=iv)/link;
        run;

The Margins macro can also fit the model and request estimates of the mean (event probability) at each month.

%margins(data=mydata, class=id iv, response=y, roptions=event="1",
     model=iv|month, dist=binomial, 
     geesubject=id, margins=month iv)

You can also use the Margins macro to compute the slope in each period as the average marginal effect over the months in the period. 

%margins(data=mydata, class=id iv, response=y, roptions=event="1",
     model=iv|month, dist=binomial, 
     geesubject=id, effect=month, within=iv=0)
%margins(data=mydata, class=id iv, response=y, roptions=event="1",
     model=iv|month, dist=binomial, 
     geesubject=id, effect=month, within=iv=1)

If "PRE/POST GAP" means that you want to estimate the change over the two months immediately on either side of the intervention, then this can be done by estimating a contrast on the monthly margins. This code assumes there are 6 months with intervention between months 3 and 4.

     data c; 
        length label f $32767; 
        infile datalines delimiter='|';
        input label f; 
        datalines;
      gap: month4-month3 | 0 0 -1 1 0 0
      ;
%margins(data=mydata, class=id iv, response=y, roptions=event="1",
     model=iv|month, dist=binomial, 
     geesubject=id, 
     margins=iv month, contrasts=c)
ha33
Obsidian | Level 7

Thanks a lot!

The GEE model works but the margins macro doesn't (I work in enterprise guide, maybe that's why)

Anyway that was very helpful.

StatDave
SAS Super FREQ

It's not clear what you mean by "doesn't work" - are there errors reported in the log? The use of the Enterprise Guide interface should not cause any problem using the macro. You might need to download the latest version of the macro as instructed in the Margins macro documentation.

 

Note that the use of the Margins macro allows you to get results in terms of the event probability - that is, on the mean scale. If you just want estimates of the slopes and the change from month 3 to month 4 in terms of the log odds (that is, on the logit link scale), then you can do that in the same way you could for a normal response. The easiest way is using an equivalent nested form of the model. You can use an ESTIMATE statement to get the odds ratio comparing the odds from month 3 to month 4. The first two ESTIMATE statements are not really needed but provide the log odds estimates (as depicted in the EFFECTPLOT) at months 3 and 4. The last ESTIMATE statement estimates their difference and exponentiates it to obtain the odds ratio. In the parameter estimates table, the parameters for IV are the intercepts for the pre- and post-intervention periods and the parameters for MONTH(IV) are the slopes. Again, the intercepts and slopes are on the log odds scale and are depicted in the EFFECTPLOT.

     proc gee data=mydata; 
        class id iv;
        model y(event="1") = iv month(iv) / dist=bin noint;
        repeated subject=id;
        effectplot slicefit(x=month sliceby=iv) / extend=data link clm;
        estimate 'm3 log odds' iv 1 0 month(iv) 3 0 / plots=none;
        estimate 'm4 log odds' iv 0 1 month(iv) 0 4 / plots=none;
        estimate 'gap odds ratio' iv -1 1 month(iv) -3 4 / exp cl;
        run;

ha33
Obsidian | Level 7

Thanks again. 

I run SAS in a protected environment on a remote desktop where the data is stored (GDPR rules) so I don't have the option to download the macro. 

Event probability would be very useful, but slopes before and after iv and the immediate change in terms of odds ratio is also very informative. 

Is there any other way of getting an event probability or test the difference in slope before and after without using the macro?

StatDave
SAS Super FREQ

I still haven't heard how the macro doesn't work for you. If calling the macro causes an error that invocation of the macro is not resolved, then you are apparently not running the current SAS release, SAS 9.4M8, which includes the macro. If you can update to the latest SAS release, then you'll be able to use it. Another way that the analysis can be done uses another macro - the NLMeans macro. Fortunately, the NLMeans macro has shipped with SAS/STAT since SAS 9.4M6, so it is available in the current release and the previous two releases. The code below shows how that is done. Slopes pre and post are computed using linear contrasts over the repeated measurements in each period.

     proc gee data=mydata; 
        class id iv;
        model y(event="1") = iv|month / dist=bin;
        repeated subject=id;
        effectplot slicefit(x=month sliceby=iv) / extend=data clm;
        effectplot slicefit(x=month sliceby=iv) / extend=data link clm;
        estimate 
          'm1' intercept 1 iv 1 0 month 1 iv*month 1 0,
          'm2' intercept 1 iv 1 0 month 2 iv*month 2 0,
          'm3' intercept 1 iv 1 0 month 3 iv*month 3 0,
          'm4' intercept 1 iv 0 1 month 4 iv*month 0 4,
          'm5' intercept 1 iv 0 1 month 5 iv*month 0 5,
          'm6' intercept 1 iv 0 1 month 6 iv*month 0 6 / e ilink;
          ods output coef=coef;
          store gee;
        run;
      data c;
        length label $32767; 
        infile datalines dsd;
        input label k1-k6;
        set=1;
        datalines;
      pre slope, -.5, 0, .5,   0, 0, 0
      post slope,  0, 0, 0,   -.5, 0, .5
      gap m4-m3,   0, 0, -1,   1, 0, 0
      ;
      %nlmeans(instore=gee, coef=coef, link=logit, contrasts=c)
ha33
Obsidian | Level 7

Thanks for replying.


It runs SAS EG Version 8.3 update 7, so I guess the margins macro isn't included there (yes, it says invocation of the macro is not resolved) .

I can't update it or download the macro as it is a government server without internet connection and I don't have administrator rights.

 

However the NLMEANS macro is available and gives the slopes in the 3 month pre and post periods., which is great. 

 

I have several more months than I have specified in the table above. I am not quite sure what the all the values in c refer to:

 

 pre slope, -.5, 0, .5,   0, 0, 0
      post slope,  0, 0, 0,   -.5, 0, .5
      gap m4-m3,   0, 0, -1,   1, 0, 0

 What do these numbers refer to, and how would you adjust if there were e.g. 6 months on each side of the intervention instead of 3?

 

 

StatDave
SAS Super FREQ
Those coefficients should probably use 1/3=0.333333 instead of .5. That just computes the simple concept of slope: rise/run, that is, the difference in the two end values divided by the number of months (3, in this example). For 6 months in each of period, just use -1/6 and 1/6 with zeros between. Note that this result will not be the same as what Margins would give since it is a different method of estimation.
ha33
Obsidian | Level 7

Thanks for the clarification. 


I experience two issues:

 

1) When I run the NLMeans macro I get the following warning:

 

"The final Hessian matrix is not positive definite, and therefore the estimated covariance matrix is not full rank and may be unreliable.  The variance of some parameter estimates is zero or some parameters are linearly related to other parameters."

 

Do you have any idea what the issue could be? The warning appears when some covariates are included (month 1-12 to account for seasonality and previously_treated (binary)) but also when they are not included. 

 

2) My results seem to be the opposite of what I expected (and compared to quite clear trends when just plotting the data). I expected an upward trend in risk before and a downward trend after the intervention when using the NLmeans but I consistently get the opposite. Could there be something in the code that mixes up pre- and post slopes?

 

 

 

 

 

 

StatDave
SAS Super FREQ

First, a correction of a correction: the 0.5 coefficients first shown are correct since the run in the slope computation is obviously 3-1 = 6-4 = 2, not 3. Now, regarding the Hessian message, this is addressed in the Usage section of the NLMeans macro documentation and results from using the default GLM parameterization of CLASS variables. See the documentation. Regarding the direction of your results - this results from common mistakes such as modeling the probability of the nonevent of your binary response rather than the event level of interest. Any time you model a binary response you should use the EVENT= response variable option, as shown in my examples, to specify the event level. The other possible reason is not using appropriate signs on the coefficients in the CONTRASTS= data set. In my example, the intent is to estimate the slope from increasing time, so the desired differences are the month 3 - month 1 and the month 6 - month 4 differences. Hence the month 3 and 6 coefficients are +0.5 and the month 1 and month 4 coefficients are -0.5.

ha33
Obsidian | Level 7

Thanks again.

About the estimations in the proc gee statement: 

 

          'm1' intercept 1 iv 1 0 month 1 iv*month 1 0,
          'm2' intercept 1 iv 1 0 month 2 iv*month 2 0,
          'm3' intercept 1 iv 1 0 month 3 iv*month 3 0,
          'm4' intercept 1 iv 0 1 month 4 iv*month 0 4,
          'm5' intercept 1 iv 0 1 month 5 iv*month 0 5,
          'm6' intercept 1 iv 0 1 month 6 iv*month 0 6 / e ilink;

Should it not be the opposite in terms of iv and iv*month, because iv=1 in months m4-6? In this code, it says iv 1 0 on the months preceding the intervention, and I am wondering it should be the other way around, so that: 

 

          'm1' intercept 1 iv 0 1 month 1 iv*month 0 1,
          'm2' intercept 1 iv 0 1 month 2 iv*month 0 2,
          'm3' intercept 1 iv 0 1 month 3 iv*month 0 3,
          'm4' intercept 1 iv 1 0 month 4 iv*month 4 0,
          'm5' intercept 1 iv 1 0 month 5 iv*month 5 0,
          'm6' intercept 1 iv 1 0 month 6 iv*month 6 0 / e ilink;

 

 

StatDave
SAS Super FREQ
This is why the ESTIMATE statement should be avoided whenever possible. The correct determination of coefficients is very easy to mess up. That is why the Margins macro, if possible, is the easier solution for this analysis.

IV=0 indicates the pre-intervention period (months 1-3) and IV=1 is the post period (months 4-6). By default (unless you change the default parameterization or reference level in the CLASS statement), there are two parameters presented for IV - the first one is for IV=0 and is labeled as such in the parameter estimates table. The second one, which is equal to zero since it is the reference level, is labeled for IV=1. Similarly, the IV*MONTH has two parameters with the first for MONTH at IV=0 and the second for MONTH at IV=1. So, the predicted value for month 1 in the pre-intervention period is computed as the intercept, plus the first IV (IV=0) parameter, plus (since MONTH is continuous) the month value (1) times the MONTH parameter, plus the month value times the first IV*MONTH parameter. The ESTIMATE statement just creates the vector of coefficients that multiply the vector of parameter estimates. For month 1: (1 1 0 1 1 0)' * Beta. For month 4: (1 0 1 4 0 4)' * Beta.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2295 views
  • 3 likes
  • 2 in conversation