- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have data set that I am performing a regression analysis on. I want to output the error sum of squares to another data set so I can use it to perform MLE and unbiased estimates of standard deviation.
Here is what I am attempting:
Proc Reg Data=Data1 outest = est tableout;
Model y = x;
Run;
Data est;
MLE = SSE / (_N_);
Proc Print Data=est;
Run;
I do not think I am calling the SSE from est correctly. Is there a way I can assign SSE from proc reg to have a specific variable name that I can call from est in a data step?
I am very new to SAS, so I am sorry if this is very basic.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can get the SSE directly - you don't have to calculate it again. See code below for an example.
Also, in SAS it is good practice to prepare a data file, then use this file in a procedure. It is not a good idea to do calculations "on the fly", as this increases the risk of something going wrong, and makes it harder to debug.
Norman.
/* x01.txt from http://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html
/*
/* Reference:
/*
/* Helmut Spaeth,
/* Mathematical Algorithms for Linear Regression,
/* Academic Press, 1991, page 304,
/* ISBN 0-12-656460-4.
/*
/* S Weisberg,
/* Applied Linear Regression,
/* Wiley, 1980, pages 128-129.
/*
/* Discussion:
/*
/* The data records the average weight of the brain and body for
/* a number of mammal species.
/*
/* There are 62 rows of data.
*/;
data brain;
input observ brainwt bodywt;
label observ="Observation No"
brainwt="Brain weight"
bodywt="Body weight";
datalines;
1 3.385 44.500
2 0.480 15.500
3 1.350 8.100
4 465.000 423.000
5 36.330 119.500
6 27.660 115.000
7 14.830 98.200
8 1.040 5.500
9 4.190 58.000
10 0.425 6.400
11 0.101 4.000
12 0.920 5.700
13 1.000 6.600
14 0.005 0.140
15 0.060 1.000
16 3.500 10.800
17 2.000 12.300
18 1.700 6.300
19 2547.000 4603.000
20 0.023 0.300
21 187.100 419.000
22 521.000 655.000
23 0.785 3.500
24 10.000 115.000
25 3.300 25.600
26 0.200 5.000
27 1.410 17.500
28 529.000 680.000
29 207.000 406.000
30 85.000 325.000
31 0.750 12.300
32 62.000 1320.000
33 6654.000 5712.000
34 3.500 3.900
35 6.800 179.000
36 35.000 56.000
37 4.050 17.000
38 0.120 1.000
39 0.023 0.400
40 0.010 0.250
41 1.400 12.500
42 250.000 490.000
43 2.500 12.100
44 55.500 175.000
45 100.000 157.000
46 52.160 440.000
47 10.550 179.500
48 0.550 2.400
49 60.000 81.000
50 3.600 21.000
51 4.288 39.200
52 0.280 1.900
53 0.075 1.200
54 0.122 3.000
55 0.048 0.330
56 192.000 180.000
57 3.000 25.000
58 160.000 169.000
59 0.900 2.600
60 1.620 11.400
61 0.104 2.500
62 4.235 50.400
;
ods graphics on;
proc reg data=brain plot=diagnostics(stats=all) outest=brainest;
id brainwt bodywt;
model brainwt = bodywt /SSE ;
output out=brainout;
run;
data brainest;
set brainest;
n = 62;
mle = _SSE_ / n;
run;
proc print data = brainest;
run;
proc print data = brainout;
run;
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You need to add SSE to the model statement (model y = x / SSE). The variable will be named _SSE_
Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is this your actual code or just a snippet for illustration? In the datastep you are dsestroying dataset EST. This would be better:
Proc Reg Data=Data1 outest = est tableout;
Model y = x;
Run;
Data est;
set est; *<=== read the output estimates from proc reg;
MLE = SSE / (_N_);
Also it looks like you may be confused between the number of observations and the row number (_N_).
But, again, we may not be be looking at the actual code but just a skeleton.
Kind regards,
- Jan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Silly question, but is there a way that I can perform arithmetic in other procedures, like the print procedure, or is user generated arithmetic only limited to a data step? This is the main source of confusion I am having, as I come from a C/C++ background so I'm not used to not being able to utilize my data in whichever function I want.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can get the SSE directly - you don't have to calculate it again. See code below for an example.
Also, in SAS it is good practice to prepare a data file, then use this file in a procedure. It is not a good idea to do calculations "on the fly", as this increases the risk of something going wrong, and makes it harder to debug.
Norman.
/* x01.txt from http://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html
/*
/* Reference:
/*
/* Helmut Spaeth,
/* Mathematical Algorithms for Linear Regression,
/* Academic Press, 1991, page 304,
/* ISBN 0-12-656460-4.
/*
/* S Weisberg,
/* Applied Linear Regression,
/* Wiley, 1980, pages 128-129.
/*
/* Discussion:
/*
/* The data records the average weight of the brain and body for
/* a number of mammal species.
/*
/* There are 62 rows of data.
*/;
data brain;
input observ brainwt bodywt;
label observ="Observation No"
brainwt="Brain weight"
bodywt="Body weight";
datalines;
1 3.385 44.500
2 0.480 15.500
3 1.350 8.100
4 465.000 423.000
5 36.330 119.500
6 27.660 115.000
7 14.830 98.200
8 1.040 5.500
9 4.190 58.000
10 0.425 6.400
11 0.101 4.000
12 0.920 5.700
13 1.000 6.600
14 0.005 0.140
15 0.060 1.000
16 3.500 10.800
17 2.000 12.300
18 1.700 6.300
19 2547.000 4603.000
20 0.023 0.300
21 187.100 419.000
22 521.000 655.000
23 0.785 3.500
24 10.000 115.000
25 3.300 25.600
26 0.200 5.000
27 1.410 17.500
28 529.000 680.000
29 207.000 406.000
30 85.000 325.000
31 0.750 12.300
32 62.000 1320.000
33 6654.000 5712.000
34 3.500 3.900
35 6.800 179.000
36 35.000 56.000
37 4.050 17.000
38 0.120 1.000
39 0.023 0.400
40 0.010 0.250
41 1.400 12.500
42 250.000 490.000
43 2.500 12.100
44 55.500 175.000
45 100.000 157.000
46 52.160 440.000
47 10.550 179.500
48 0.550 2.400
49 60.000 81.000
50 3.600 21.000
51 4.288 39.200
52 0.280 1.900
53 0.075 1.200
54 0.122 3.000
55 0.048 0.330
56 192.000 180.000
57 3.000 25.000
58 160.000 169.000
59 0.900 2.600
60 1.620 11.400
61 0.104 2.500
62 4.235 50.400
;
ods graphics on;
proc reg data=brain plot=diagnostics(stats=all) outest=brainest;
id brainwt bodywt;
model brainwt = bodywt /SSE ;
output out=brainout;
run;
data brainest;
set brainest;
n = 62;
mle = _SSE_ / n;
run;
proc print data = brainest;
run;
proc print data = brainout;
run;
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would suggest a small change to the datastep. The number of observations is now hardcoded. Using NOBS= would make it more flexible:
data brainest;
set brainest nobs=n;
mle = _SSE_ / n;
run;
See Example 10: Performing a Function until the Last Observation Is Reached
Regards,
- Jan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Very good!
Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@jklaverstijn there is a small problem with that BRAINEST does not have N observations. Perhaps a safer path is to use the ODS table NOBS create by PROC REG.
proc reg data=brain /*plot=diagnostics(stats=all)*/ outest=brainest;
id brainwt bodywt;
model brainwt = bodywt /SSE ;
output out=brainout;
ods output NObs=NObs;
run;
quit;
data brainest2;
set brainest;
if _n_ eq 1 then set nobs(keep=label n where=(label='Number of Observations Used'));
drop label;
*n = 62;
mle = _SSE_ / n;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ah yes of course you're right. *Facepalm*. Silly mistake. Nice catch.
- Jan.