BookmarkSubscribeRSS Feed
Shad
Obsidian | Level 7

Hi, 

 

I'm running into some trouble trying to figure out contrast statements with proc mixed. I'm attempting a difference-in-difference model to compare proportions between hispanic/latino and non-hispanic/latino given an event that occurs. I'm trying to contrast or compare the gap pre intervention vs post intervention between hispanic and non-hispanic, I believe the best way to accomplish this is through the use of a contrast statement? 

 

Shad_0-1632168121280.png

Figure. to illustrate what I'm trying to compare (ie the gap between orange and blue line pre-intervention to the gap between orange and blue lines post-intervention). 

 

data hisp1; 
	input Hispanic time COVID percent_perf;
	datalines;
1 0 0 33.3333
1 1 0 36.0172
1 2 0 34.9497
1 3 0 38.1514
1 4 0 35.0831
1 5 1 38.587
1 6 1 39.7946
1 7 1 37.7932
1 8 1 39.0023
1 9 1 35.4019
0 0 0 31.5364
0 1 0 31.0515
0 2 0 30.2232
0 3 0 32.5556
0 4 0 32.3446
0 5 1 33.6667
0 6 1 33.3706
0 7 1 29.8397
0 8 1 32.5166
0 9 1 29.5235
;
run;

  PROC MIXED DATA = hisp1 METHOD=ML
 PLOTS(MAXPOINTS=60000)=(RESIDUALPANEL(UNPACK) VCIRYPANEL(UNPACK));
 CLASS  Hispanic (ref="0") COVID (ref="0") ;
 MODEL percent_perf = time COVID COVID*time Hispanic Hispanic*time Hispanic*COVID Hispanic*COVID*time/ S ;
 REPEATED intercept / TYPE = UN R;

 RUN;

 Thank you! 

8 REPLIES 8
jiltao
SAS Super FREQ

What variable gives you the pre- or post-intervention information? 

How is this repeated measures data? Do you have two subjects in the data, each with 10 measurements?

I can help you with an ESTIMATE/CONTRAST statement to test difference in difference, but you might want to first to make sure your model is reasonable for your data.

Thanks,

Jill

 

Shad
Obsidian | Level 7

Hi Jill! Thanks for the response. I actually simplified the data/model specification to post here since I wasn't quite sure how to best share a sample of the data set with all the subjects included.  

 

 PROC MIXED DATA = new3 METHOD=ML PLOTS(MAXPOINTS=60000)=(RESIDUALPANEL(UNPACK) VCIRYPANEL(UNPACK));
 	CLASS  hospital_number race (ref="2") COVID (ref="0");
 	MODEL percent_perf = time COVID COVID*time race race*time race*COVID race*COVID*time/ S ;
 	REPEATED  / subject = hospital_number TYPE = ar(1) R;
 RUN;

My data is actually at the hospital level, so I have observations for each hospital at quarterly intervals. The "intervention" is the variable COVID (0 - denoting a time period before the pandemic, 1 - denotes a period after). 

 

Hopefully that clarifies it. 🙂 

 

 

jiltao
SAS Super FREQ

Thanks for the info!

So you are fitting an ANCOVA model. Do you want the DID for the intercept or the slope? I will provide both below --

PROC MIXED DATA = new3 METHOD=ML PLOTS(MAXPOINTS=60000)=(RESIDUALPANEL(UNPACK) VCIRYPANEL(UNPACK));
 	CLASS  hospital_number race (ref="2") COVID (ref="0");
 	MODEL percent_perf = time COVID COVID*time race race*time race*COVID race*COVID*time/ S ;
 	REPEATED  / subject = hospital_number TYPE = ar(1) R;
    estimate 'DID for race*covid when time=0' race*covid 1 -1 -1 1;
    estimate 'DID for the slopes between race*covid' race*covid*time 1 -1 -1 1;
 RUN;

Hope this helps,

Jill

Shad
Obsidian | Level 7

Thanks Jill! 

 

I'm not sure that's exactly what I'm trying to estimate. 

 

    estimate 'DID for race*covid when time=0' race*covid 1 -1 -1 1;

If I understand this correctly, this would be the estimated mean "jump" in the outcome at the interruption between races. Essentially Beta 6 in the model output (race*covid). Which checks out when comparing the estimate with the model output. 

 

    estimate 'DID for the slopes between race*covid' race*covid*time 1 -1 -1 1;

Similarly, isn't this the estimated difference in slopes after the interruption between races, or B7 time*race*covid. 

 

Perhaps, I'm being silly and that is already explained by the model output. But how would I go about comparing the  interruption periods, ie is the gap between races pre-interruptions significantly wider or smaller compared to after the interruption? 

 

 

 

jiltao
SAS Super FREQ

Because your model has the covariate TIME, you essentially are fitting a regression model for different groups. For your DID request, you need to specify the TIME value. At what TIME value do you want this DID? 

Shad
Obsidian | Level 7

That TIME variable is the piece of the estimate statement that I think I find the most confusing. If I wanted to get the estimate for each TIME would I just add one to the estimate?

 

 

estimate 'DID for the slopes between race*covid time = 0' race*covid*time 1 -1 -1 1;
estimate 'DID for the slopes between race*covid time = 1' race*covid*time 1 -1 -1 2;
estimate 'DID for the slopes between race*covid time = 2' race*covid*time 1 -1 -1 3;
.
.
.
estimate 'DID for the slopes between race*covid time = N' race*covid*time 1 -1 -1 N;

Thanks for your help! 

jiltao
SAS Super FREQ

It does not make sense to compare slopes for a specific time point. You might want to compare the expected response value between different groups at a certain time point. Below are some ESTIMATE statements you might find helpful --

estimate 'DID for race*covid at time = 0' race*covid 1 -1 -1 1;
estimate 'DID for race*covid at time = 1' race*covid 1 -1 -1 1 race*covid*time 1 -1 -1 1;
estimate 'DID for race*covid at time = 2' race*covid 1 -1 -1 1 race*covid*time 2 -2 -2 2;

However, I am not sure if this makes practical sense -- do you have measurements at time 1 for post intervention? Or is it always times 0 to 4 for pre and times 5-9 for post? If so, you might want to reconsider your model specifications, considering what your analysis goal is.

Jill

 

SteveDenham
Jade | Level 19

The OP might also wish to consider using a generalized linear (mixed) model, since the response variable is a proportion.  GENMOD or GLIMMIX seem more appropriate, depending on the need for marginal or conditional means/errors and on the inference space to be used.

 

SteveDenham

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1390 views
  • 0 likes
  • 3 in conversation