I have a data in the form
Sales Is_Discounted
20 1
10 0
25 1
9 0
18 0
I want to create a linear regression such that only error greater than absolute(5) is used to build model that predicts sales. So any error in the range [-5,5] is considered as zero error. Could you please provide the code or reference to do this? Thanks!
Do you really mean observations whose residuals are GREATER THAN 5? This is backwards from the usual robust regression technique in which large residuals are downweighted and small residuals are kept. I am not aware of any papers that recommend excluding observations that have small residuals.
IF YOU MEANT LESS THAN 5:
This analysis is a type of robust regression. In SAS, the ROBUSTREG procedure supports four different robust regression algorithms.
The process of downweighting large residuals is known as iteratively reweighted least squares, and it is supported by the M estimation method in PROC ROBUSTREG. The main idea is to fit a weighted least squares estimate to the data, initially with each observation receiving equal weight. After the initial fit, observations that have large residuals are downweighted by applying a weight function. PROC ROBUSTREG supports 10 weight functions. I suggest you start with the default, but if you really want a "hard" cuttoff (all or nothing weights) then you can choose the WEIGHTFUNCTION=TALWORTH(5) option, which applies zero weights to residuals whose magnitudes that are greater than 5 units.
IF YOU MEANT GREATER THAN 5:
You can use PROC NLIN in SAS to construct any iteratively reweighted least squares algorithm. There is an example in the doc that you can follow. You would modify the statement that sets the weight. It might look something like this:
if abs(resid)<=5 then _weight_=0;
else _weight_=resid;
I have to be frank: it's not clear to me that this model will converge. I think I can construct data for which this process will alternate between two solutions and never converge.
In your data set, change numbers between -5 and +5 to zero.
Thanks! If I set sales between -5 and +5 to zero in my training data would it be the same as finding the best fitting line such that minimal errors are not counted? Please explain.
@roushankumar wrote:
Thanks! If I set sales between -5 and +5 to zero in my training data would it be the same as finding the best fitting line such that minimal errors are not counted? Please explain.
I didn't read your original problem statement as carefully as I should have.
I agree with @Reeza, you'd have to create your own regression algorithm to do this.
They want residuals to be between -5 and 5. Not sure how that affects the regression, except in terms of minimization which means fiddling with the algorithms under the hood, so to speak. I've never heard of this type of regression but it's possible technically. Statistically not sure if it's valid.
@PaigeMiller wrote:
In your data set, change numbers between -5 and +5 to zero.
I have done this several times in open source platforms but I am new to SAS and haven't been able to find a way here. Such customized models have helped me model cases where 'you want to stabilize your model further' or 'you would rather have your predictions greater than actuals rather than lower'.
If you can figure out in your head the proper algorithm, then you can use PROC IML to perform the calculations.
Well, you can call R from SAS IML which may be a good alternative, or for the minimization functions you can try PROC OPTMODEL, for an optimization algorithm, ie fit a non linear model essentially. I don't know enough about those to say if that would work, but a suggestion on where to start looking.
Perhaps @Rick_SAS knows of a better alternative.
Do you really mean observations whose residuals are GREATER THAN 5? This is backwards from the usual robust regression technique in which large residuals are downweighted and small residuals are kept. I am not aware of any papers that recommend excluding observations that have small residuals.
IF YOU MEANT LESS THAN 5:
This analysis is a type of robust regression. In SAS, the ROBUSTREG procedure supports four different robust regression algorithms.
The process of downweighting large residuals is known as iteratively reweighted least squares, and it is supported by the M estimation method in PROC ROBUSTREG. The main idea is to fit a weighted least squares estimate to the data, initially with each observation receiving equal weight. After the initial fit, observations that have large residuals are downweighted by applying a weight function. PROC ROBUSTREG supports 10 weight functions. I suggest you start with the default, but if you really want a "hard" cuttoff (all or nothing weights) then you can choose the WEIGHTFUNCTION=TALWORTH(5) option, which applies zero weights to residuals whose magnitudes that are greater than 5 units.
IF YOU MEANT GREATER THAN 5:
You can use PROC NLIN in SAS to construct any iteratively reweighted least squares algorithm. There is an example in the doc that you can follow. You would modify the statement that sets the weight. It might look something like this:
if abs(resid)<=5 then _weight_=0;
else _weight_=resid;
I have to be frank: it's not clear to me that this model will converge. I think I can construct data for which this process will alternate between two solutions and never converge.
Hi Rick,
Could you please help me with the issue here
Thanks
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.