BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
roushankumar
Fluorite | Level 6

I have a data in the form

 

               Sales            Is_Discounted

                     20                   1     

                     10                    0
                     25                    1
                     9                    0
                   18                    0  

 

I want to create  a linear regression such that only error greater than absolute(5) is used to build model that predicts sales. So any error in the range [-5,5] is considered as zero error. Could you please provide the code or reference to do this? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Do you really mean observations whose residuals are GREATER THAN 5? This is backwards from the usual robust regression technique in which large residuals are downweighted and small residuals are kept. I am not aware of any papers that recommend excluding observations that have small residuals.

 

IF YOU MEANT LESS THAN 5:

This analysis is a type of robust regression. In SAS, the ROBUSTREG procedure supports four different robust regression algorithms.

The process of downweighting large residuals is known as iteratively reweighted least squares, and it is supported by the M estimation method in PROC ROBUSTREG. The main idea is to fit a weighted least squares estimate to the data, initially with each observation receiving equal weight. After the initial fit, observations that have large residuals are downweighted by applying a weight function. PROC ROBUSTREG supports 10 weight functions. I suggest you start with the default, but if you really want a "hard" cuttoff (all or nothing weights) then you can choose the WEIGHTFUNCTION=TALWORTH(5) option, which applies zero weights to residuals whose magnitudes that are greater than 5 units.

 

IF YOU MEANT GREATER THAN 5:

You can use PROC NLIN in SAS to construct any iteratively reweighted least squares algorithm. There is an example in the doc that you can follow. You would modify the statement that sets the weight. It might look something like this:

 

if abs(resid)<=5 then _weight_=0;

else _weight_=resid;

 

I have to be frank: it's not clear to me that this model will converge. I think I can construct data for which this process will alternate between two solutions and never converge.

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

In your data set, change numbers between -5 and +5 to zero.

--
Paige Miller
roushankumar
Fluorite | Level 6

Thanks! If I set sales between -5 and +5 to zero in my training data would it be the same as finding the best fitting line such that minimal errors are not counted? Please explain. 

PaigeMiller
Diamond | Level 26

@roushankumar wrote:

Thanks! If I set sales between -5 and +5 to zero in my training data would it be the same as finding the best fitting line such that minimal errors are not counted? Please explain. 


I didn't read your original problem statement as carefully as I should have.

 

I agree with @Reeza, you'd have to create your own regression algorithm to do this.

--
Paige Miller
Reeza
Super User

They want residuals to be between -5 and 5. Not sure how that affects the regression, except in terms of minimization which means fiddling with the algorithms under the hood, so to speak. I've never heard of this type of regression but it's possible technically. Statistically not sure if it's valid. 

 


@PaigeMiller wrote:

In your data set, change numbers between -5 and +5 to zero.


 

roushankumar
Fluorite | Level 6

I have done this several times in open source platforms but I am new to SAS and haven't been able to find a way here. Such customized models have helped me model cases where 'you want to stabilize your model further' or 'you would rather have your predictions greater than actuals rather than lower'.

PaigeMiller
Diamond | Level 26

If you can figure out in your head the proper algorithm, then you can use PROC IML to perform the calculations.

--
Paige Miller
Reeza
Super User

Well, you can call R from SAS IML which may be a good alternative, or for the minimization functions you can try PROC OPTMODEL, for an optimization algorithm, ie fit a non linear model essentially. I don't know enough about those to say if that would work, but a suggestion on where to start looking. 

 

Perhaps @Rick_SAS knows of a better alternative. 

 

 

Rick_SAS
SAS Super FREQ

Do you really mean observations whose residuals are GREATER THAN 5? This is backwards from the usual robust regression technique in which large residuals are downweighted and small residuals are kept. I am not aware of any papers that recommend excluding observations that have small residuals.

 

IF YOU MEANT LESS THAN 5:

This analysis is a type of robust regression. In SAS, the ROBUSTREG procedure supports four different robust regression algorithms.

The process of downweighting large residuals is known as iteratively reweighted least squares, and it is supported by the M estimation method in PROC ROBUSTREG. The main idea is to fit a weighted least squares estimate to the data, initially with each observation receiving equal weight. After the initial fit, observations that have large residuals are downweighted by applying a weight function. PROC ROBUSTREG supports 10 weight functions. I suggest you start with the default, but if you really want a "hard" cuttoff (all or nothing weights) then you can choose the WEIGHTFUNCTION=TALWORTH(5) option, which applies zero weights to residuals whose magnitudes that are greater than 5 units.

 

IF YOU MEANT GREATER THAN 5:

You can use PROC NLIN in SAS to construct any iteratively reweighted least squares algorithm. There is an example in the doc that you can follow. You would modify the statement that sets the weight. It might look something like this:

 

if abs(resid)<=5 then _weight_=0;

else _weight_=resid;

 

I have to be frank: it's not clear to me that this model will converge. I think I can construct data for which this process will alternate between two solutions and never converge.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1260 views
  • 2 likes
  • 4 in conversation