Solved: Linear regression that takes into account only error greater than a sp...

roushankumar · Posted 09-10-2018 03:24 PM

I have a data in the form

Sales Is_Discounted

20 1

10 0
25 1
9 0
18 0

I want to create a linear regression such that only error greater than absolute(5) is used to build model that predicts sales. So any error in the range [-5,5] is considered as zero error. Could you please provide the code or reference to do this? Thanks!

Rick_SAS · Posted 09-11-2018 08:20 AM

Do you really mean observations whose residuals are GREATER THAN 5? This is backwards from the usual robust regression technique in which large residuals are downweighted and small residuals are kept. I am not aware of any papers that recommend excluding observations that have small residuals.

IF YOU MEANT LESS THAN 5:

This analysis is a type of robust regression. In SAS, the ROBUSTREG procedure supports four different robust regression algorithms.

The process of downweighting large residuals is known as iteratively reweighted least squares, and it is supported by the M estimation method in PROC ROBUSTREG. The main idea is to fit a weighted least squares estimate to the data, initially with each observation receiving equal weight. After the initial fit, observations that have large residuals are downweighted by applying a weight function. PROC ROBUSTREG supports 10 weight functions. I suggest you start with the default, but if you really want a "hard" cuttoff (all or nothing weights) then you can choose the WEIGHTFUNCTION=TALWORTH(5) option, which applies zero weights to residuals whose magnitudes that are greater than 5 units.

IF YOU MEANT GREATER THAN 5:

You can use PROC NLIN in SAS to construct any iteratively reweighted least squares algorithm. There is an example in the doc that you can follow. You would modify the statement that sets the weight. It might look something like this:

if abs(resid)<=5 then _weight_=0;

else _weight_=resid;

I have to be frank: it's not clear to me that this model will converge. I think I can construct data for which this process will alternate between two solutions and never converge.

View solution in original post

PaigeMiller · Posted 09-10-2018 03:37 PM

In your data set, change numbers between -5 and +5 to zero.

--
Paige Miller

roushankumar · Posted 09-10-2018 03:57 PM

Thanks! If I set sales between -5 and +5 to zero in my training data would it be the same as finding the best fitting line such that minimal errors are not counted? Please explain.

PaigeMiller · Posted 09-10-2018 04:06 PM

@roushankumar wrote:

Thanks! If I set sales between -5 and +5 to zero in my training data would it be the same as finding the best fitting line such that minimal errors are not counted? Please explain.

I didn't read your original problem statement as carefully as I should have.

I agree with @Reeza, you'd have to create your own regression algorithm to do this.

--
Paige Miller

Reeza · Posted 09-10-2018 04:02 PM

They want residuals to be between -5 and 5. Not sure how that affects the regression, except in terms of minimization which means fiddling with the algorithms under the hood, so to speak. I've never heard of this type of regression but it's possible technically. Statistically not sure if it's valid.

@PaigeMiller wrote:

In your data set, change numbers between -5 and +5 to zero.

roushankumar · Posted 09-10-2018 04:07 PM

I have done this several times in open source platforms but I am new to SAS and haven't been able to find a way here. Such customized models have helped me model cases where 'you want to stabilize your model further' or 'you would rather have your predictions greater than actuals rather than lower'.

PaigeMiller · Posted 09-10-2018 04:15 PM

If you can figure out in your head the proper algorithm, then you can use PROC IML to perform the calculations.

--
Paige Miller

Reeza · Posted 09-10-2018 04:18 PM

Well, you can call R from SAS IML which may be a good alternative, or for the minimization functions you can try PROC OPTMODEL, for an optimization algorithm, ie fit a non linear model essentially. I don't know enough about those to say if that would work, but a suggestion on where to start looking.

Perhaps @Rick_SAS knows of a better alternative.

Rick_SAS · Posted 09-11-2018 08:20 AM

Do you really mean observations whose residuals are GREATER THAN 5? This is backwards from the usual robust regression technique in which large residuals are downweighted and small residuals are kept. I am not aware of any papers that recommend excluding observations that have small residuals.

IF YOU MEANT LESS THAN 5:

This analysis is a type of robust regression. In SAS, the ROBUSTREG procedure supports four different robust regression algorithms.

The process of downweighting large residuals is known as iteratively reweighted least squares, and it is supported by the M estimation method in PROC ROBUSTREG. The main idea is to fit a weighted least squares estimate to the data, initially with each observation receiving equal weight. After the initial fit, observations that have large residuals are downweighted by applying a weight function. PROC ROBUSTREG supports 10 weight functions. I suggest you start with the default, but if you really want a "hard" cuttoff (all or nothing weights) then you can choose the WEIGHTFUNCTION=TALWORTH(5) option, which applies zero weights to residuals whose magnitudes that are greater than 5 units.

IF YOU MEANT GREATER THAN 5:

You can use PROC NLIN in SAS to construct any iteratively reweighted least squares algorithm. There is an example in the doc that you can follow. You would modify the statement that sets the weight. It might look something like this:

if abs(resid)<=5 then _weight_=0;

else _weight_=resid;

I have to be frank: it's not clear to me that this model will converge. I think I can construct data for which this process will alternate between two solutions and never converge.

roushankumar · Posted 04-11-2019 01:47 PM

Hi Rick,

Could you please help me with the issue here

https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/proc-hpfengine-fails-to-make-predict...

Thanks

Linear regression that takes into account only error greater than a specific threshold

Re: Linear regression that takes into account only error greater than a specific threshold

Re: Linear regression that tales into account error greater than a specific threshold

Re: Linear regression that tales into account error greater than a specific threshold

Re: Linear regression that tales into account error greater than a specific threshold

Re: Linear regression that tales into account error greater than a specific threshold

Re: Linear regression that tales into account error greater than a specific threshold

Re: Linear regression that tales into account error greater than a specific threshold

Re: Linear regression that tales into account error greater than a specific threshold

Re: Linear regression that takes into account only error greater than a specific threshold

Re: Linear regression that takes into account only error greater than a specific threshold