- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I came across a regression model in SAS which does not use an intercept (constant term), and I would like to see whether the model violates the OLS assumption of zero sum of residuals. How can I view if that is the case in SAS?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is not an output of any PROC (that I know of). You could save the residuals to an output data set and sum them yourself.
When you don't use an intercept, the residuals (usually) will NOT usually sum to zero.
I would like to see whether the model violates the OLS assumption of zero sum of residuals
Whether or not the residuals sum to zero in this case, this DOES NOT violate the assumptions of OLS. This is not an assumption of OLS.
A better question to ask when you see a regression with no intercept is: what is the logical justification for not using an intercept? That would be a violation of appropriate model building rules if there is no logical justification for this, and a more serious problem (in my mind) than the residuals not summing to zero (which in my mind is meaningless and not worth bothering over).
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is not an output of any PROC (that I know of). You could save the residuals to an output data set and sum them yourself.
When you don't use an intercept, the residuals (usually) will NOT usually sum to zero.
I would like to see whether the model violates the OLS assumption of zero sum of residuals
Whether or not the residuals sum to zero in this case, this DOES NOT violate the assumptions of OLS. This is not an assumption of OLS.
A better question to ask when you see a regression with no intercept is: what is the logical justification for not using an intercept? That would be a violation of appropriate model building rules if there is no logical justification for this, and a more serious problem (in my mind) than the residuals not summing to zero (which in my mind is meaningless and not worth bothering over).
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@adrfinance wrote:
the variables are standardized to have zero mean and standard deviation 1. So I assume it is ok not to have an intercept. Right or not?
The justification for not having an intercept in this case is fine. Of course, if you back-transform the data to the original scale, there is an intercept.
If you hadn't standardized the data, then not having an intercept requires justification.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@adrfinance wrote:
Yes I came across some models where they standardize all the variables and then they run regressions without intercepts. I assume they do these standardizations (0 mean 1 std) in order to be able to run regressions without intercepts in the first place....
My assumption would be that they do the standardizing so that regression coefficients can be compared to one another. I doubt that wanting to run a regression without an intercept is a valid reason for standardizing.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@adrfinance wrote:
Yes you might be right but I see many models from them without intercepts so I just made that out. Maybe I am wrong though. I always thought that running a model without an intercept is wrong, didn't know about the fact that a zero mean variable would be fine to be used in a regression without a constant term.
There are fields of study that have their own practices for whatever historical reason. So that might be what you are seeing. Sometimes these things can become so embedded in the "culture" that the reasons are lost in the mists of history.
I have done OLS regression by hand, meaning pencil and paper. (Admittedly not a lot of records but still...) I can see anything that reduces the steps being considered a "good thing". The process may have survived into the time when computers and software are more flexible just because that was the way it was done years ago.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The "LS" in OLS is least squares. That is, the OLS algorithm minimizes the sum of squared residuals, which need not even minimize the sum of unsquared residuals, much less set that sum to zero.
Oops, as @PaigeMiller and @adrfinance pointed out, I need to review my recollection of OLS. Thanks to both. I've editted this post just in case someone doesn't notice the responses.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Unless you have removed the intercept from the model, the math forces the sum of un-squared residuals to be zero.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you're right.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------