Hi All,
I would like to implement a univariate diff-in-diff test presumably using proc ttest. Any procedure would be fine with me as long as I can do so.
The below dataset includes four groups by two class variables: TYPE (SUV and Sedan) and ORIGIN (Asia and Europe).
Let me put it this way arithmetically:
Mean of mpg for Asia and Sedan: a
Mean of mpg for Asia and SUV: b
Mean of mpg for Europe and Sedan: c
Mean of mpg for Europe and SUV: d
The t-stat I want for is (a-b) - (c-d).
This way, I can see whether the mean difference in mpg between SUV and Sedan is different between Asia and Europe (i.e., diff-in-diff).
Ideally, I would like to implement this using the dataset generated in the below SAS code (i.e., the one with three variables, mpg_city, origin, type). Hope that somebody can help me.
data temp;
set sashelp.cars;
keep mpg_city origin type;
where origin in ("Asia", "Europe") and Type in ("Sedan" "SUV");
run;
proc sort data= temp;
by origin type mpg_city; run;
proc print data= temp; run; *172;
proc tabulate data= temp;
class origin type;
var mpg_city;
table origin, type*mpg_city*( n mean) ; run;
Hi @braam,
According to Usage Note 61830: Estimating the difference in differences of means you can use PROC GENMOD:
proc genmod data=temp;
class origin(ref='Europe') type(ref='SUV');
model mpg_city = origin type origin*type;
run;
Result (excerpt) with point estimate, standard error, confidence interval and significance test of the requested difference in differences:
Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Origin*Type Asia Sedan 1 0.5076 1.6221 -2.6716 3.6868 0.10 0.7543
Hi @braam,
According to Usage Note 61830: Estimating the difference in differences of means you can use PROC GENMOD:
proc genmod data=temp;
class origin(ref='Europe') type(ref='SUV');
model mpg_city = origin type origin*type;
run;
Result (excerpt) with point estimate, standard error, confidence interval and significance test of the requested difference in differences:
Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Origin*Type Asia Sedan 1 0.5076 1.6221 -2.6716 3.6868 0.10 0.7543
I think in principle this method could also be applied to continuous independent variables (or one continuous and one discrete variable), but the interpretation would be different: You would consider the change in the dependent variable per unit increase of the continuous variable rather than compare means of subgroups.
Ah, I see. Yes, the method can also be applied to models with a discrete dependent variable such as logistic regression models: see section "Generalized linear models with non-identity link" in Usage Note 61830.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.