I am trying to teach myself how to adjust for categorical covariates in difference-in-difference analysis. I am playing around with a data set (posted below) previously posted in a SAS community question, examining how rates of 3 different health insurance policies (ins = 0,1, or 2) changed from time t0 to t1 between states that implemented a policy (s=1) or did not implement a policy (s=0). data x;
input ins s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 281 5.3
0 0 1 97 5.0
0 1 0 841 3.4
0 1 1 154 1.8
1 0 0 410 7.7
1 0 1 159 8.3
1 1 0 2488 10.1
1 1 1 1193 14.1
2 0 0 4602 86.9
2 0 1 1671 86.7
2 1 0 21350 86.5
2 1 1 7137 84.1
;
proc logistic data=x;
class ins s t / param=glm ref=first;
model count/n = ins|s|t;
lsmeans ins*s*t / e ilink;
ods output coef=coeffs;
store log;
run;
data difdif;
input k1-k12;
set=1;
datalines;
1 -1 -1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 -1 -1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 -1 -1 1
;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
title=Difference in Difference of Means) This code above does not adjust for covariates, but gives an accurate D-I-D analysis. Now, I want to adjust for a variable Age (two categories: Age=0 or Age=1), which I added to the data set below, with updated counts and percentages for each row. I adjusted for this new variable Age as shown in the code below. It ran just fine. But I must be missing something because when I removed Age as a covariate by simply removing it from the model statement, I expected these results to be identical to the original code with the original dataset (from your post on 8/26, which did not have any Age data at all), but they did not match. Shouldn't removing Age as a covariate cause the Age=0 and Age=1 rows for a given (ins s t) combo to be treated as one group; thus the two datasets (with and without Age) should be handled in the same way? What am I missing here? data x;
input ins age s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 0 46 0.9
0 0 0 1 18 0.9
0 0 1 0 172 0.7
0 0 1 1 33 0.4
0 1 0 0 235 4.4
0 1 0 1 79 4.1
0 1 1 0 669 2.7
0 1 1 1 121 1.4
1 0 0 0 60 1.1
1 0 0 1 29 1.5
1 0 1 0 442 1.8
1 0 1 1 222 2.6
1 1 0 0 350 6.6
1 1 0 1 130 6.7
1 1 1 0 2046 8.3
1 1 1 1 971 11.4
2 0 0 0 1019 19.3
2 0 0 1 367 19.0
2 0 1 0 4947 20.0
2 0 1 1 1665 19.6
2 1 0 0 3583 67.7
2 1 0 1 1304 67.7
2 1 1 0 16403 66.5
2 1 1 1 5472 64.5
;
proc logistic data=x;
class ins age s t / param=glm ref=first;
model count/n = ins|s|t age;
lsmeans ins*s*t / e ilink;
ods output coef=coeffs;
store log;
run;
data difdif;
input k1-k12;
set=1;
datalines;
1 -1 -1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 -1 -1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 -1 -1 1
;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
title=Difference in Difference of Means - Adjusted for age)
... View more