10-18-2021
sasuser2222
Calcite | Level 5
Member since
09-28-2021
- 7 Posts
- 0 Likes Given
- 0 Solutions
- 0 Likes Received
-
Latest posts by sasuser2222
Subject Views Posted 1585 10-15-2021 08:26 PM 1648 10-13-2021 07:53 PM 1672 10-13-2021 02:17 PM 1753 10-13-2021 01:09 AM 1627 09-30-2021 02:58 PM 1644 09-29-2021 02:11 PM 1674 09-28-2021 03:07 PM -
Activity Feed for sasuser2222
- Posted Re: Adjusting for covariates in difference-in-difference analysis - where am I going wrong? on Statistical Procedures. 10-15-2021 08:26 PM
- Posted Re: Adjusting for covariates in difference-in-difference analysis - where am I going wrong? on Statistical Procedures. 10-13-2021 07:53 PM
- Posted Re: Adjusting for covariates in difference-in-difference analysis - where am I going wrong? on Statistical Procedures. 10-13-2021 02:17 PM
- Posted Adjusting for covariates in difference-in-difference analysis - where am I going wrong? on Statistical Procedures. 10-13-2021 01:09 AM
- Posted Re: Difference-in-difference analysis for rates using group level data on Statistical Procedures. 09-30-2021 02:58 PM
- Posted Re: Difference-in-difference analysis for rates using group level data on Statistical Procedures. 09-29-2021 02:11 PM
- Posted Re: Difference-in-difference analysis for rates using group level data on Statistical Procedures. 09-28-2021 03:07 PM
10-15-2021
08:26 PM
Thanks for all the help. I need to play around with it some more because I'm still getting the same count totals compared to the original data when I'm adding up the Age=0 and Age=1 counts for a given INS/S/T group, which implies to me that the total N should be the same. Anyway, I'm sure it's something I'm screwing up, but I really appreciate all the input.
... View more
10-13-2021
07:53 PM
@StatDave with your last comment I think I've discovered that we're calculating N differently: When I look at the summary output from your code, the TOTN values are super high, and I'm not sure how those are calculated data x;
input ins age s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 0 183 3.5
0 0 0 1 69 3.6
0 0 1 0 527 2.1
0 0 1 1 104 1.2
0 1 0 0 98 1.9
0 1 0 1 28 1.5
0 1 1 0 314 1.3
0 1 1 1 50 0.6
1 0 0 0 278 5.3
1 0 0 1 94 4.9
1 0 1 0 1646 6.7
1 0 1 1 779 9.2
1 1 0 0 132 2.5
1 1 0 1 65 3.4
1 1 1 0 842 3.4
1 1 1 1 414 4.9
2 0 0 0 2687 50.8
2 0 0 1 960 49.8
2 0 1 0 12322 49.9
2 0 1 1 4066 47.9
2 1 0 0 1915 36.2
2 1 0 1 711 36.9
2 1 1 0 9028 36.6
2 1 1 1 3071 36.2
;
proc summary data=x nway;
class ins s t; var count n;
output out=out sum=totcount totn;
run; I'm under the impression that TOTN should be calculated as follows: summarizing the data should combine the AGE=0 and AGE=1 row for a given INS/S/T. The counts are additive, as are the percents (because they are percents of the same denominator (aka a given S/T combination)). For example, for INS/S/T=0/0/0, adding together the AGE=0 and AGE=0 rows gives: total count = 183+98 = 281, total percent = 3.5+1.9 = 5.4. Total N can then be calculated from total count and total percent. So I ran the code below, where I summarized the data (which gives me TOTCOUNT and TOTPERCENT), and then I calculated TOTN from those variables. Modeling TOTCOUNT/TOTN in my DID code gives me the same output from the DID analysis of the original dataset not split into Age. Doesn't this mean that the dataset is not the issue? Because I thought that taking Age out of the model statement in my DID code (from my initial post) essentially causes Age = 0 and Age =1 for a given INS/S/T to be added together in the same way I summarized the data below (which worked!). But clearly I'm still missing something, because these two actions (taking age out of model statement vs summarizing) do not yield the same result, and thus I know that my adjustment for the Age covariable is incorrect. Ultimately, I still can't figure out how to adjust for Age correctly. data x;
input ins age s t count percent;
datalines;
0 0 0 0 183 3.5
0 0 0 1 69 3.6
0 0 1 0 527 2.1
0 0 1 1 104 1.2
0 1 0 0 98 1.9
0 1 0 1 28 1.5
0 1 1 0 314 1.3
0 1 1 1 50 0.6
etc.
;
proc summary data=x nway;
class ins s t; var count percent;
output out=out sum=totcount totpercent;
run;
data xNoAge;
set out;
totn=round(totcount/(totpercent/100));
run;
proc print data=xNoAge;
run;
proc logistic data=xNoAge;
class ins s t / param=glm ref=first;
model totcount/totn = ins|s|t;
lsmeans ins*s*t / e ilink;
ods output coef=coeffs;
store log;
run;
data difdif;
input k1-k12;
set=1;
datalines;
1 -1 -1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 -1 -1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 -1 -1 1
;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
title=Difference in Difference of Means)
... View more
10-13-2021
02:17 PM
Ah, you're right! I must have had a typo splitting the data into Age categories. This dataset below PROPERLY splits the original INS/S/T groups from the original dataset into Age=0 and Age=1. However, even with this corrected dataset, I'm still running into the same issue: -The code below shows how I am adjusting for Age as a covariate -Next, when I modify the code below to REMOVE Age as a covariate, I do this by removing it from the model statement (model count/n = ins|s|t). I expect this to give the same results as the original code where INS/S/T were NOT split into Age groups at all (see first post in this thread where the input data does not even contain an Age column), but the DID outputs do not match and I'm stumped! data x;
input ins age s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 0 183 3.5
0 0 0 1 69 3.6
0 0 1 0 527 2.1
0 0 1 1 104 1.2
0 1 0 0 98 1.9
0 1 0 1 28 1.5
0 1 1 0 314 1.3
0 1 1 1 50 0.6
1 0 0 0 278 5.3
1 0 0 1 94 4.9
1 0 1 0 1646 6.7
1 0 1 1 779 9.2
1 1 0 0 132 2.5
1 1 0 1 65 3.4
1 1 1 0 842 3.4
1 1 1 1 414 4.9
2 0 0 0 2687 50.8
2 0 0 1 960 49.8
2 0 1 0 12322 49.9
2 0 1 1 4066 47.9
2 1 0 0 1915 36.2
2 1 0 1 711 36.9
2 1 1 0 9028 36.6
2 1 1 1 3071 36.2
;
proc logistic data=x;
class ins age s t / param=glm ref=first;
model count/n = ins|s|t age;
lsmeans ins*s*t / e ilink;
ods output coef=coeffs;
store log;
run;
data difdif;
input k1-k12;
set=1;
datalines;
1 -1 -1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 -1 -1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 -1 -1 1
;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
title=Difference in Difference of Means - Adjusted for age)
... View more
10-13-2021
01:09 AM
I am trying to teach myself how to adjust for categorical covariates in difference-in-difference analysis. I am playing around with a data set (posted below) previously posted in a SAS community question, examining how rates of 3 different health insurance policies (ins = 0,1, or 2) changed from time t0 to t1 between states that implemented a policy (s=1) or did not implement a policy (s=0). data x;
input ins s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 281 5.3
0 0 1 97 5.0
0 1 0 841 3.4
0 1 1 154 1.8
1 0 0 410 7.7
1 0 1 159 8.3
1 1 0 2488 10.1
1 1 1 1193 14.1
2 0 0 4602 86.9
2 0 1 1671 86.7
2 1 0 21350 86.5
2 1 1 7137 84.1
;
proc logistic data=x;
class ins s t / param=glm ref=first;
model count/n = ins|s|t;
lsmeans ins*s*t / e ilink;
ods output coef=coeffs;
store log;
run;
data difdif;
input k1-k12;
set=1;
datalines;
1 -1 -1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 -1 -1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 -1 -1 1
;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
title=Difference in Difference of Means) This code above does not adjust for covariates, but gives an accurate D-I-D analysis. Now, I want to adjust for a variable Age (two categories: Age=0 or Age=1), which I added to the data set below, with updated counts and percentages for each row. I adjusted for this new variable Age as shown in the code below. It ran just fine. But I must be missing something because when I removed Age as a covariate by simply removing it from the model statement, I expected these results to be identical to the original code with the original dataset (from your post on 8/26, which did not have any Age data at all), but they did not match. Shouldn't removing Age as a covariate cause the Age=0 and Age=1 rows for a given (ins s t) combo to be treated as one group; thus the two datasets (with and without Age) should be handled in the same way? What am I missing here? data x;
input ins age s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 0 46 0.9
0 0 0 1 18 0.9
0 0 1 0 172 0.7
0 0 1 1 33 0.4
0 1 0 0 235 4.4
0 1 0 1 79 4.1
0 1 1 0 669 2.7
0 1 1 1 121 1.4
1 0 0 0 60 1.1
1 0 0 1 29 1.5
1 0 1 0 442 1.8
1 0 1 1 222 2.6
1 1 0 0 350 6.6
1 1 0 1 130 6.7
1 1 1 0 2046 8.3
1 1 1 1 971 11.4
2 0 0 0 1019 19.3
2 0 0 1 367 19.0
2 0 1 0 4947 20.0
2 0 1 1 1665 19.6
2 1 0 0 3583 67.7
2 1 0 1 1304 67.7
2 1 1 0 16403 66.5
2 1 1 1 5472 64.5
;
proc logistic data=x;
class ins age s t / param=glm ref=first;
model count/n = ins|s|t age;
lsmeans ins*s*t / e ilink;
ods output coef=coeffs;
store log;
run;
data difdif;
input k1-k12;
set=1;
datalines;
1 -1 -1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 -1 -1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 -1 -1 1
;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
title=Difference in Difference of Means - Adjusted for age)
... View more
09-30-2021
02:58 PM
That makes sense! I'm playing around with the original data set but I must still be doing something wrong --- I created a variable called Age with levels (0,1) without changing the group totals from the original data set. So the model data set I created has twice as many rows (since each "ins s t" combination is now split into Age = 0 and Age =1). I adjusted for this new variable Age as shown in the code below. It ran just fine. But I must be missing something because when I removed Age as a covariate by simply removing it from the model statement, I expected these results to be identical to the original code with the original dataset (from your post on 8/26, which did not have any Age data at all), but they did not match. Shouldn't removing Age as a covariate cause the Age=0 and Age=1 rows for a given (ins s t) combo to be treated as one group; thus the two datasets (with and without Age) should be handled in the same way? What am I missing here? (Finally just wanted to express gratitude to @StatDave for helping self-taught SAS users like me find some clarity in the fog of countless hours of SAS notes and tutorials and youtube videos!) data x;
input ins age s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 0 46 0.9
0 0 0 1 18 0.9
0 0 1 0 172 0.7
0 0 1 1 33 0.4
0 1 0 0 235 4.4
0 1 0 1 79 4.1
0 1 1 0 669 2.7
0 1 1 1 121 1.4
1 0 0 0 60 1.1
1 0 0 1 29 1.5
1 0 1 0 442 1.8
1 0 1 1 222 2.6
1 1 0 0 350 6.6
1 1 0 1 130 6.7
1 1 1 0 2046 8.3
1 1 1 1 971 11.4
2 0 0 0 1019 19.3
2 0 0 1 367 19.0
2 0 1 0 4947 20.0
2 0 1 1 1665 19.6
2 1 0 0 3583 67.7
2 1 0 1 1304 67.7
2 1 1 0 16403 66.5
2 1 1 1 5472 64.5
;
proc logistic data=x;
class ins age s t / param=glm ref=first;
model count/n = ins|s|t age;
lsmeans ins*s*t / e ilink;
ods output coef=coeffs;
store log;
run;
data difdif;
input k1-k12;
set=1;
datalines;
1 -1 -1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 -1 -1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 -1 -1 1
;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
title=Difference in Difference of Means - Adjusted for age)
... View more
09-29-2021
02:11 PM
Thanks! I'm probably approaching this the wrong way, but if your data set already contains the additional covariates you want to control for, as well as the counts and percentages for each group, I thought it would be a simple matter of incorporating them into your model statement, since the proportions of the covariates are already part of your data set. For example, based off the original code in this thread, if the data set also included Age (0, 1, 2) and Race (0, 1) (I didn't write out all the datalines), you would add Age and Race to your class and model statements, but then I'm not sure what else is needed. data x;
input ins age race s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 0 0 281 5.3
0 0 0 0 1 97 5.0
0 0 0 1 0 841 3.4
0 0 0 1 1 154 1.8
0 0 1 0 0 410 7.7
0 0 1 0 1 159 8.3
etc...
;
proc logistic data=x;
class ins age race s t / param=glm ref=first;
model count/n = ins|s|t age race;
lsmeans ins*s*t / e ilink;
ods output coef=coeffs;
store log;
run;
data difdif;
input k1-k12;
set=1;
datalines;
1 -1 -1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 -1 -1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 -1 -1 1
;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
title=Difference in Difference of Means)
... View more
09-28-2021
03:07 PM
Hey StatDave_sas, Nice response. For this example, is there a good way to add covariates like 'Age' to aggregated data to get an adjusted DID? Seems straightforward to add these covariates with case level data, but with aggregated data, I imagine you'd have to create Age 'groups' (eg. 0-30y = 0, 31-60y = 1, 61+ = 2) to transform it into categorical data, then recalculate counts and % with the new Age column added. Do you have a better way to do this?
... View more