BookmarkSubscribeRSS Feed
braam
Quartz | Level 8

I would like to estimate regression with three sets of fixed effects and clustered standard errors. PROC SURVEYREG works, but it has memory issues when there are many fixed effects and/or clusters. So, I decided to use RPOC STDIZE to de-mean my variables. But, I don't think my code is proper. Could anybody help me to point out what is wrong in my code?

 

My code includes two PROC SURVEYREG.

1) The original data with explicit fixed effects.

2) The de-meaned data (i.e., implicit fixed effects).

 

I expected the results to be the same, but they are not.

 

 


			
		proc freq data= test order=freq; table f1; run;
		proc freq data= test order=freq; table f2; run;
		proc freq data= test order=freq; table f3; run;



* Centering data by f1, f2, and f3;
		data test_centered;
			set test;
			run;

		proc sort data= test_centered; by f1; run;
		proc stdize data= test_centered 
			out= test_centered
			method= mean 
			oprefix= old 
			sprefix; 
			by f1; 
			var dv iv; 
			run;


		proc sort data= test_centered; by f2; run;
		proc stdize data= test_centered
			out= test_centered
			method= mean 
			oprefix= old 
			sprefix; 
			by f2; 
			var dv iv; 
			run;

		proc sort data= test_centered; by f3; run;
		proc stdize data= test_centered
			out= test_centered
			method= mean 
			oprefix= old 
			sprefix; 
			by f3; 
			var dv iv; 
			run;

		proc means data= test; var dv iv; run;
		proc means data= test_centered; var dv iv; run;


		proc surveyreg data= test;
			class f1 f2 f3;
			model dv= iv  f1 f2 f3/ solution;
			run;
proc surveyreg data= test_centered; model dv= iv / solution; run; proc print data= test; run;

 

 

My Original DataSet: 

 

Obs f1 f2 f3 dv iv
1 20202 37 2016 14.1690 4
2 20202 37 2017 14.3602 6
3 20202 37 2018 14.4787 5
4 293110377 38 2016 12.6248 3
5 293110377 38 2017 13.0541 3
6 293110377 38 2018 12.9945 3
7 2020245 28 2016 11.2252 3
8 2020245 28 2017 11.3145 4
9 2020245 28 2018 11.2848 4
10 40023 35 2016 15.6811 2
11 40023 35 2017 15.7506 2
12 40023 35 2018 15.8072 2
13 265160055 36 2016 13.5722 6
14 265160055 36 2017 13.5261 7
15 265160055 36 2018 13.4228 6
16 151340376 28 2016 13.0350 3
17 151340376 28 2017 12.9871 3
18 151340376 28 2018 13.2472 4
19 293110377 38 2016 13.5696 3
20 293110377 38 2017 13.6979 3
21 272200375 38 2016 11.6168 1
22 272200375 38 2017 11.6243 2
23 272200375 38 2018 11.6396 2
24 130376 36 2016 12.4459 1
25 130376 36 2017 12.8828 1
26 130376 36 2018 12.8970 1
27 20342 28 2016 12.9657 4
28 20342 28 2017 13.4190 3
29 20342 28 2018 13.1806 4
30 10421 38 2016 15.4309 2
31 10421 38 2017 15.2439 2
32 10421 38 2018 15.1993 2
33 265160089 33 2016 11.8636 2
34 265160089 33 2017 12.0782 2
35 265160089 33 2018 12.0895 2
36 30111 38 2016 15.1774 5
37 30111 38 2017 15.2131 7
38 30111 38 2018 15.2249 6
39 60435 42 2016 12.8841 2
40 60435 42 2017 12.8980 1
41 60435 42 2018 13.0379 2
42 20023 73 2016 15.8567 5
43 20023 73 2017 15.8304 5
44 20023 73 2018 15.9903 5
45 30245 38 2016 13.9173 8
46 30245 38 2017 15.0267 9
47 30245 38 2018 15.1380 11
48 82560245 73 2016 12.4837 1
49 82560245 73 2017 12.7665 2
50 82560245 73 2018 12.9239 1
51 30198 56 2016 14.5976 2
52 30198 56 2017 14.6309 2
53 30198 56 2018 14.6419 2
54 30322 73 2016 14.6623 1
55 30322 73 2017 14.6135 2
56 30322 73 2018 14.6001 2
57 18510373 38 2016 11.9250 4
58 18510373 38 2017 11.9576 4
59 18510373 38 2018 11.9184 4
60 20111 38 2016 14.6730 8
61 20111 38 2017 14.6226 9
62 20111 38 2018 14.7825 10
63 158380359 73 2017 11.2266 1
64 158380359 73 2018 11.0898 1
4 REPLIES 4
PaigeMiller
Diamond | Level 26

No need to center IV in PROC STDIZE. That's one difference, the models are working on different IV. 

 

Also, if you center the variables F1 F2 and F3, I think this needs to be done in one big PROC STDIZE such as

proc sort data=test_centered;
    by f1 f2 f3;
run;
proc stdize data= test_centered out= test_centered method= mean 
    oprefix= old sprefix; 
    by f1 f2 f3; 
    var dv; 
run;

Then you ought to be able to fit the model with DV = IV and leave out the F1 F2 F3. (Although this is the equivalent of fitting a 3-way interaction between the variables F1 F2 F3).

 

Although, one thing I'm not really sure of is that for some reason, I remember that for this to work F2 must be nested in F1, and F3 must be nested in F1*F2, and I can't remember why this is now (however, your variables are not nested).

--
Paige Miller
PaigeMiller
Diamond | Level 26

Okay, here's my OPINION #2

 

I think what you have done is the equivalent of the no interaction case between F1 F2 and F3, so maybe your idea is okay. You still don't want to center the variable IV.

 

I am assigning you a homework project: reduce the size of your data set so there are only a small number of levels of F1 F2 and F3, so it will run through PROC SURVEYREG and get some answers. Then do the PROC STDIZE method on the reduced data set and see if the answers match.

 

--
Paige Miller
braam
Quartz | Level 8

@PaigeMiller 

 

Thanks for your suggestion. I tried centering only the dependent variable, but it didn't help. As you suggested, I reduced my sample to 17 obs. Following this code, the two PROC SURVEYREG gives me:

 

(1) Coeff on IV= 0.057581 (t= 0.37, p= 0.7137) from the first reg with explicit fixed effects

(2) Coeff on IV= 0.0578481 (t=0.47, p=0.6456) from the second reg with the  centered data

 

It seems that the coefficients are pretty close, but t-stat and p-value are quite different. In addition, when I use larger sample, the difference gets really substantial.

 


* Centering data by f1, f2, and f3;
		data test;
			set temp.test;
			where f1 in (20202 293110377 2020245 40023 265160055 151340376) and
					f2 in ( 37 38 28 35 38);
			run;

		proc print data= test;
			run;

		data test_centered;
			set test;
			run;

		proc sort data= test_centered; by f1; run;
		proc stdize data= test_centered 
			out= test_centered
			method= mean 
			oprefix= old 
			sprefix; 
			by f1; 
			var dv iv; 
			run;


		proc sort data= test_centered; by f2; run;
		proc stdize data= test_centered
			out= test_centered
			method= mean 
			oprefix= old 
			sprefix; 
			by f2; 
			var dv iv ; 
			run;

		proc sort data= test_centered; by f3; run;
		proc stdize data= test_centered
			out= test_centered
			method= mean 
			oprefix= old 
			sprefix; 
			by f3; 
			var dv  iv; 
			run;

		proc means data= test; var dv; run;
		proc means data= test_centered; var dv; run;


		proc surveyreg data= test;
			class f1 f2 f3;
			model dv= iv f1 f2 f3/ solution;
			run;
		proc surveyreg data= test_centered;
			model dv= iv / solution;
			run;
Obs f1 f2 f3 dv iv
1 20202 37 2016 14.1690 4
2 20202 37 2017 14.3602 6
3 20202 37 2018 14.4787 5
4 293110377 38 2016 12.6248 3
5 293110377 38 2017 13.0541 3
6 293110377 38 2018 12.9945 3
7 2020245 28 2016 11.2252 3
8 2020245 28 2017 11.3145 4
9 2020245 28 2018 11.2848 4
10 40023 35 2016 15.6811 2
11 40023 35 2017 15.7506 2
12 40023 35 2018 15.8072 2
13 151340376 28 2016 13.0350 3
14 151340376 28 2017 12.9871 3
15 151340376 28 2018 13.2472 4
16 293110377 38 2016 13.5696 3
17 293110377 38 2017 13.6979 3
PaigeMiller
Diamond | Level 26

(1) Coeff on IV= 0.057581 (t= 0.37, p= 0.7137) from the first reg with explicit fixed effects

(2) Coeff on IV= 0.0578481 (t=0.47, p=0.6456) from the second reg with the centered data

 

Yes, I consider this to be an expected result. The coefficients are esssentially the same, but the degrees of freedom in the model with the centered data ought to be different and so the t-values and p-values ought to be different. Is the sum of squares for error from these two models the same (I would expect it to be since the coefficients are virtually equal)? If so, then I think you have the same model fit, and if you "manually" adjust the degrees of freedom, then you ought to get the same t-value and p-value.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 440 views
  • 0 likes
  • 2 in conversation