I have a dataset that looks as follows:
DATA have;
input dummy1 dummy2 event_year firm_id variable
;
DATALINES;
0 0 1991 1034 5.7991
0 0 1991 1365 8.5963
0 0 1991 1789 7.9652
0 0 1991 1865 5.6145
0 0 2004 1034 3.6768
0 0 2004 1365 10.1621
0 0 2004 2282 5.4541
0 0 2004 2812 6.6856
0 0 2004 3895 8.2246
0 0 2004 5404 7.6025
0 0 2004 6109 3.2838
0 0 2004 7086 7.5047
0 1 1991 1372 5.2026
0 1 1991 1640 3.692
0 1 1991 3093 5.0352
0 1 1991 3840 4.6172
0 1 1991 5594 2.9139
0 1 1991 5973 3.1315
0 1 2004 1372 6.1267
0 1 2004 1640 4.8229
0 1 2004 1926 6.7382
0 1 2004 2034 7.2528
0 1 2004 2787 7.89
0 1 2004 3607 7.8935
0 1 2004 4145 4.2265
1 0 1991 1004 5.9473
1 0 1991 1078 8.6867
1 0 1991 1598 6.4022
1 0 1991 1609 10.3318
1 0 1991 1613 5.3902
1 0 1991 1651 5.8021
1 0 1991 1686 5.555
1 0 2004 1609 7.439
1 0 2004 1613 10.0747
1 0 2004 1651 8.0287
1 0 2004 1686 12.6915
1 1 1991 1036 3.7112
1 1 1991 1327 4.0193
1 1 1991 1397 4.5393
1 1 1991 1585 3.3894
1 1 1991 1608 7.98
1 1 1991 1632 6.1909
1 1 1991 1659 5.4968
1 1 2004 1439 4.1855
1 1 2004 1478 10.2339
1 1 2004 1659 6.0975
1 1 2004 1689 5.9035
;
RUN;
There are two binary variables: dummy1 and dummy2 that splits up the sample. I want to test for a difference in the difference in means as follows. First, I run the following code:
proc surveymeans data=have ;
cluster firm_id event_year ;
var variable;
domain dummy1*dummy2 / diffmeans;
ods output
DomainDiffs=sort_domaindiff
Domain=sort_domain;
run;
Using the above code, I can produce the following table:
For example, 7.849927 is the mean of (1, 0) where the notation is (dummy1, dummy2). The value 2.236536 is the difference in means between (1, 0) and (1, 1) groups, and it's significance can be tested by reading off the p-value from the dataset "Sort_domaindiff". However, what I am interested in testing is the significance of the value 0.87189 which is the difference in the difference of means, i.e., the difference in means between (1, 0)-(1,1) and (0, 0)-(0,1). However, I do not know how to achieve this in proc surveymeans as there is no output for this. Is there a way to achieve this in proc surveymeans? If not, is there another method which can I use to find the significance of this difference in difference of means? Note that I need to cluster the standard errors by firm_id event_year, which is why I am using proc surveymeans. If another method is used, I also need to ensure standard errors are clustered by these two variables.
If you want to further compute the variance of two mean-differences, you can use the cov option to get the covariance matrix and then do a simply math of var(L'Beta)=L' Var(Beta) L, and the var(Beta) is the covariance matrix saved in the data set "cov" when you run surveymeans as the following, and I am sure you know your L in your message can be written as (-1, 1, 1, -1).
proc surveymeans data=have ;
cluster firm_id event_year ;
var variable;
domain dummy1*dummy2 / diffmeans cov;
ods output
DomainDiffs=sort_domaindiff
Domain=sort_domain
DomainMeanCov=cov;
run;
I noticed that unless you are just use a small portion of the data as an example for this post, your cluster statement makes each observation a unique PSU, so it's the same as you run the proc without the cluster statement - not that matters, but that seems the only "survey design" information you have in this example. Also, the specific data in your example also results in a diagonal matrix for the said cov matrix, that means you can get what you want with a calculator: var=1.372558, and you can get the t-value as described in the doc as 0.87189/sqrt(1.372558)=0.7442, and you don't have to look up the t-distribution table to know it's insignificant.
Finally, there is no reason you should use surveyreg when you can use surveymeans for this functionality.
I am assuming you are using a complex survey data based on a probability survey design (the survey weight is missing from your code).
Now to compare custom mean comparisons within domain levels, you can use PROC SURVEYREG model with NOINT and VARADJUST=NONE and CONTRAST.
Please refer the following NOTE: https://support.sas.com/kb/34/607.html
If you want to further compute the variance of two mean-differences, you can use the cov option to get the covariance matrix and then do a simply math of var(L'Beta)=L' Var(Beta) L, and the var(Beta) is the covariance matrix saved in the data set "cov" when you run surveymeans as the following, and I am sure you know your L in your message can be written as (-1, 1, 1, -1).
proc surveymeans data=have ;
cluster firm_id event_year ;
var variable;
domain dummy1*dummy2 / diffmeans cov;
ods output
DomainDiffs=sort_domaindiff
Domain=sort_domain
DomainMeanCov=cov;
run;
I noticed that unless you are just use a small portion of the data as an example for this post, your cluster statement makes each observation a unique PSU, so it's the same as you run the proc without the cluster statement - not that matters, but that seems the only "survey design" information you have in this example. Also, the specific data in your example also results in a diagonal matrix for the said cov matrix, that means you can get what you want with a calculator: var=1.372558, and you can get the t-value as described in the doc as 0.87189/sqrt(1.372558)=0.7442, and you don't have to look up the t-distribution table to know it's insignificant.
Finally, there is no reason you should use surveyreg when you can use surveymeans for this functionality.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.