BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Zak711
Calcite | Level 5

I have data set that I am performing a regression analysis on. I want to output the error sum of squares to another data set so I can use it to perform MLE and unbiased estimates of standard deviation.

 

Here is what I am attempting: 

 

Proc Reg Data=Data1 outest = est tableout;
Model y = x;
Run;

 

Data est; 

MLE = SSE / (_N_);

 

Proc Print Data=est;

Run;

 

I do not think I am calling the SSE from est correctly. Is there a way I can assign SSE from proc reg to have a specific variable name that I can call from est in a data step? 

 

I am very new to SAS, so I am sorry if this is very basic.

1 ACCEPTED SOLUTION

Accepted Solutions
Norman21
Lapis Lazuli | Level 10

You can get the SSE directly - you don't have to calculate it again. See code below for an example.

 

Also, in SAS it is good practice to prepare a data file, then use this file in a procedure. It is not a good idea to do calculations "on the fly", as this increases the risk of something going wrong, and makes it harder to debug.

 

Norman.

 

/* x01.txt from http://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html
/*
/*  Reference:
/*
/*    Helmut Spaeth,
/*    Mathematical Algorithms for Linear Regression,
/*    Academic Press, 1991, page 304,
/*    ISBN 0-12-656460-4.
/*
/*    S Weisberg,
/*    Applied Linear Regression,
/*    Wiley, 1980, pages 128-129.
/*
/*  Discussion:
/*
/*    The data records the average weight of the brain and body for
/*    a number of mammal species.  
/*
/*    There are 62 rows of data.
*/;
data brain;
      input observ brainwt bodywt;
      label observ="Observation No"
         brainwt="Brain weight"
         bodywt="Body weight";
      datalines;
 1     3.385    44.500
 2     0.480    15.500
 3     1.350     8.100
 4   465.000   423.000
 5    36.330   119.500
 6    27.660   115.000
 7    14.830    98.200
 8     1.040     5.500
 9     4.190    58.000
10     0.425     6.400
11     0.101     4.000
12     0.920     5.700
13     1.000     6.600
14     0.005     0.140
15     0.060     1.000
16     3.500    10.800
17     2.000    12.300
18     1.700     6.300
19  2547.000  4603.000
20     0.023     0.300
21   187.100   419.000
22   521.000   655.000
23     0.785     3.500
24    10.000   115.000
25     3.300    25.600
26     0.200     5.000
27     1.410    17.500
28   529.000   680.000
29   207.000   406.000
30    85.000   325.000
31     0.750    12.300
32    62.000  1320.000
33  6654.000  5712.000
34     3.500     3.900
35     6.800   179.000
36    35.000    56.000
37     4.050    17.000
38     0.120     1.000
39     0.023     0.400
40     0.010     0.250
41     1.400    12.500
42   250.000   490.000
43     2.500    12.100
44    55.500   175.000
45   100.000   157.000
46    52.160   440.000
47    10.550   179.500
48     0.550     2.400
49    60.000    81.000
50     3.600    21.000
51     4.288    39.200
52     0.280     1.900
53     0.075     1.200
54     0.122     3.000
55     0.048     0.330
56   192.000   180.000
57     3.000    25.000
58   160.000   169.000
59     0.900     2.600
60     1.620    11.400
61     0.104     2.500
62     4.235    50.400
   ;
ods graphics on;
   
proc reg data=brain plot=diagnostics(stats=all) outest=brainest;
       id brainwt bodywt;
       model brainwt = bodywt /SSE ;
	   output out=brainout;
run;
   
data brainest;
	set brainest;
	n = 62;
	mle = _SSE_ / n;
run;

proc print data = brainest;
run;

proc print data = brainout;
run;
Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

View solution in original post

8 REPLIES 8
Norman21
Lapis Lazuli | Level 10

You need to add SSE to the model statement (model y = x / SSE). The variable will be named _SSE_

  

Norman.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

jklaverstijn
Rhodochrosite | Level 12

Is this your actual code or just a snippet for illustration? In the datastep you are dsestroying dataset EST. This would be better:

 

Proc Reg Data=Data1 outest = est tableout;
Model y = x;
Run;

 

Data est; 
set est;  *<=== read the output estimates from proc reg;
MLE = SSE / (_N_);

 

Also it looks like you may be confused between the number of observations and the row number (_N_).

But, again, we may not be be looking at the actual code but just a skeleton.

 

Kind regards,

- Jan.

Zak711
Calcite | Level 5
This was just the relevant snippet. I assumed _N_ was the number of observations, but I guess I am wrong. What I ended up doing to hack this problem was export the residuals from Proc Reg and compute the sum of squares in Proc mean, but then I have to export that value AGAIN to another data set in order to do what I need to do, so I am looking for a way that I can do this in 1 or 2 steps.

Silly question, but is there a way that I can perform arithmetic in other procedures, like the print procedure, or is user generated arithmetic only limited to a data step? This is the main source of confusion I am having, as I come from a C/C++ background so I'm not used to not being able to utilize my data in whichever function I want.

Thanks
Norman21
Lapis Lazuli | Level 10

You can get the SSE directly - you don't have to calculate it again. See code below for an example.

 

Also, in SAS it is good practice to prepare a data file, then use this file in a procedure. It is not a good idea to do calculations "on the fly", as this increases the risk of something going wrong, and makes it harder to debug.

 

Norman.

 

/* x01.txt from http://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html
/*
/*  Reference:
/*
/*    Helmut Spaeth,
/*    Mathematical Algorithms for Linear Regression,
/*    Academic Press, 1991, page 304,
/*    ISBN 0-12-656460-4.
/*
/*    S Weisberg,
/*    Applied Linear Regression,
/*    Wiley, 1980, pages 128-129.
/*
/*  Discussion:
/*
/*    The data records the average weight of the brain and body for
/*    a number of mammal species.  
/*
/*    There are 62 rows of data.
*/;
data brain;
      input observ brainwt bodywt;
      label observ="Observation No"
         brainwt="Brain weight"
         bodywt="Body weight";
      datalines;
 1     3.385    44.500
 2     0.480    15.500
 3     1.350     8.100
 4   465.000   423.000
 5    36.330   119.500
 6    27.660   115.000
 7    14.830    98.200
 8     1.040     5.500
 9     4.190    58.000
10     0.425     6.400
11     0.101     4.000
12     0.920     5.700
13     1.000     6.600
14     0.005     0.140
15     0.060     1.000
16     3.500    10.800
17     2.000    12.300
18     1.700     6.300
19  2547.000  4603.000
20     0.023     0.300
21   187.100   419.000
22   521.000   655.000
23     0.785     3.500
24    10.000   115.000
25     3.300    25.600
26     0.200     5.000
27     1.410    17.500
28   529.000   680.000
29   207.000   406.000
30    85.000   325.000
31     0.750    12.300
32    62.000  1320.000
33  6654.000  5712.000
34     3.500     3.900
35     6.800   179.000
36    35.000    56.000
37     4.050    17.000
38     0.120     1.000
39     0.023     0.400
40     0.010     0.250
41     1.400    12.500
42   250.000   490.000
43     2.500    12.100
44    55.500   175.000
45   100.000   157.000
46    52.160   440.000
47    10.550   179.500
48     0.550     2.400
49    60.000    81.000
50     3.600    21.000
51     4.288    39.200
52     0.280     1.900
53     0.075     1.200
54     0.122     3.000
55     0.048     0.330
56   192.000   180.000
57     3.000    25.000
58   160.000   169.000
59     0.900     2.600
60     1.620    11.400
61     0.104     2.500
62     4.235    50.400
   ;
ods graphics on;
   
proc reg data=brain plot=diagnostics(stats=all) outest=brainest;
       id brainwt bodywt;
       model brainwt = bodywt /SSE ;
	   output out=brainout;
run;
   
data brainest;
	set brainest;
	n = 62;
	mle = _SSE_ / n;
run;

proc print data = brainest;
run;

proc print data = brainout;
run;
Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

jklaverstijn
Rhodochrosite | Level 12

I would suggest a small change to the datastep. The number of observations is now hardcoded. Using NOBS= would make it more flexible:

 

data brainest;
	set brainest nobs=n;
	mle = _SSE_ / n;
run;

See Example 10: Performing a Function until the Last Observation Is Reached

 

Regards,

- Jan.

Norman21
Lapis Lazuli | Level 10

Very good!

 

Norman.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

data_null__
Jade | Level 19

@jklaverstijn there is a small problem with that  BRAINEST does not have N observations.  Perhaps a safer path is to use the ODS table NOBS create by PROC REG.

 

proc reg data=brain /*plot=diagnostics(stats=all)*/ outest=brainest;
   id brainwt bodywt;
   model brainwt = bodywt /SSE ;
   output out=brainout;
   ods output NObs=NObs;
   run;
   quit;

data brainest2;
	set brainest;
   if _n_ eq 1 then set nobs(keep=label n where=(label='Number of Observations Used'));
   drop label;
	*n = 62;
	mle = _SSE_ / n;
run;
jklaverstijn
Rhodochrosite | Level 12

Ah yes of course you're right. *Facepalm*. Silly mistake. Nice catch.

 

- Jan.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2088 views
  • 3 likes
  • 4 in conversation