BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elbarto
Obsidian | Level 7

I have a dataset that looks as follows:

 

 

DATA have;
input dummy1	dummy2	event_year	firm_id	variable
;
DATALINES;
0	0	1991	1034	5.7991
0	0	1991	1365	8.5963
0	0	1991	1789	7.9652
0	0	1991	1865	5.6145
0	0	2004	1034	3.6768
0	0	2004	1365	10.1621
0	0	2004	2282	5.4541
0	0	2004	2812	6.6856
0	0	2004	3895	8.2246
0	0	2004	5404	7.6025
0	0	2004	6109	3.2838
0	0	2004	7086	7.5047
0	1	1991	1372	5.2026
0	1	1991	1640	3.692
0	1	1991	3093	5.0352
0	1	1991	3840	4.6172
0	1	1991	5594	2.9139
0	1	1991	5973	3.1315
0	1	2004	1372	6.1267
0	1	2004	1640	4.8229
0	1	2004	1926	6.7382
0	1	2004	2034	7.2528
0	1	2004	2787	7.89
0	1	2004	3607	7.8935
0	1	2004	4145	4.2265
1	0	1991	1004	5.9473
1	0	1991	1078	8.6867
1	0	1991	1598	6.4022
1	0	1991	1609	10.3318
1	0	1991	1613	5.3902
1	0	1991	1651	5.8021
1	0	1991	1686	5.555
1	0	2004	1609	7.439
1	0	2004	1613	10.0747
1	0	2004	1651	8.0287
1	0	2004	1686	12.6915
1	1	1991	1036	3.7112
1	1	1991	1327	4.0193
1	1	1991	1397	4.5393
1	1	1991	1585	3.3894
1	1	1991	1608	7.98
1	1	1991	1632	6.1909
1	1	1991	1659	5.4968
1	1	2004	1439	4.1855
1	1	2004	1478	10.2339
1	1	2004	1659	6.0975
1	1	2004	1689	5.9035

;
RUN;

There are two binary variables: dummy1 and dummy2 that splits up the sample. I want to test for a difference in the difference in means as follows. First, I run the following code:

 

 

 

proc surveymeans data=have ;

   cluster firm_id event_year ; 
   var variable;
   domain dummy1*dummy2 / diffmeans;

ods output
DomainDiffs=sort_domaindiff
Domain=sort_domain;

run;

Using the above code, I can produce the following table:

 

difference.png

 

For example, 7.849927 is the mean of (1, 0) where the notation is (dummy1, dummy2). The value 2.236536 is the difference in means between (1, 0) and (1, 1) groups, and it's significance can be tested by reading off the p-value from the dataset "Sort_domaindiff". However, what I am interested in testing is the significance of the value 0.87189 which is the difference in the difference of means, i.e., the difference in means between (1, 0)-(1,1) and (0, 0)-(0,1). However, I do not know how to achieve this in proc surveymeans as there is no output for this. Is there a way to achieve this in proc surveymeans? If not, is there another method which can I use to find the significance of this difference in difference of means? Note that I need to cluster the standard errors by firm_id event_year, which is why I am using proc surveymeans. If another method is used, I also need to ensure standard errors are clustered by these two variables.

 

1 ACCEPTED SOLUTION

Accepted Solutions
TonyAn
SAS Employee

 

If you want to further compute the variance of two mean-differences, you can use the cov option to get the covariance matrix and then do a simply math of var(L'Beta)=L' Var(Beta) L, and the var(Beta) is the covariance matrix saved in the data set "cov" when you run surveymeans as the following, and I am sure you know your L in your message can be written as (-1, 1, 1, -1).

proc surveymeans data=have ;
   cluster firm_id event_year ; 
   var variable;
   domain dummy1*dummy2 / diffmeans cov;
ods output
DomainDiffs=sort_domaindiff
Domain=sort_domain
DomainMeanCov=cov;
run;

 

I noticed that unless you are just use a small portion of the data as an example for this post, your cluster statement makes each observation a unique PSU, so it's the same as you run the proc without the cluster statement - not that matters, but that seems the only "survey design" information you have in this example. Also, the specific data in your example also results in a diagonal matrix for the said cov matrix, that means you can get what you want with a calculator: var=1.372558, and you can get the t-value as described in the doc as 0.87189/sqrt(1.372558)=0.7442, and you don't have to look up the t-distribution table to know it's insignificant.

 

Finally, there is no reason you should use surveyreg when you can use surveymeans for this functionality.

View solution in original post

2 REPLIES 2
gcjfernandez
SAS Employee

I am assuming you are using a complex survey data  based on a probability survey design (the survey weight is missing from your code).

 

Now to compare custom mean comparisons within domain levels, you can use PROC SURVEYREG model with NOINT and VARADJUST=NONE and CONTRAST.

Please refer the following NOTE: https://support.sas.com/kb/34/607.html

 

TonyAn
SAS Employee

 

If you want to further compute the variance of two mean-differences, you can use the cov option to get the covariance matrix and then do a simply math of var(L'Beta)=L' Var(Beta) L, and the var(Beta) is the covariance matrix saved in the data set "cov" when you run surveymeans as the following, and I am sure you know your L in your message can be written as (-1, 1, 1, -1).

proc surveymeans data=have ;
   cluster firm_id event_year ; 
   var variable;
   domain dummy1*dummy2 / diffmeans cov;
ods output
DomainDiffs=sort_domaindiff
Domain=sort_domain
DomainMeanCov=cov;
run;

 

I noticed that unless you are just use a small portion of the data as an example for this post, your cluster statement makes each observation a unique PSU, so it's the same as you run the proc without the cluster statement - not that matters, but that seems the only "survey design" information you have in this example. Also, the specific data in your example also results in a diagonal matrix for the said cov matrix, that means you can get what you want with a calculator: var=1.372558, and you can get the t-value as described in the doc as 0.87189/sqrt(1.372558)=0.7442, and you don't have to look up the t-distribution table to know it's insignificant.

 

Finally, there is no reason you should use surveyreg when you can use surveymeans for this functionality.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1008 views
  • 1 like
  • 3 in conversation