BookmarkSubscribeRSS Feed

Regression Assumptions Not Met? Maybe Try Quantile Regression!

Started ‎03-10-2025 by
Modified ‎03-10-2025 by
Views 777

Suppose you are responsible for establishing the warranty for a product. You want less than 5% of the products to fall into a warranty claim situation. You know the target variable is related to several possible predictors that you can use to estimate how often a claim may occur. How do you handle such a situation? You could build a regression model for the target variable, but regression models predict the mean. If the regression assumptions are correct though, you could still estimate the 5th quantile to determine the right conditions for the warranty. But what if those assumptions are not correct? How can you estimate that 5th quantile to establish the warranty conditions? One possible approach is to use quantile regression. This post will talk about how to perform quantile regression, highlight some of its advantages as well as some disadvantages.

 

To understand a situation where quantile regression might help, let’s create some data. For this situation we will create a situation with a linear trend, but non-constant variance. This situation violates one of the assumptions of linear regression. The following code will generate 50 observations with a random input variable and a target or response variable that is generated from a linear model with a variance that is proportional to the input variable level.

 

data qtrreg;
   do i=1 to 50;
      predictor_x = 3 + rannor(1);
      target_y = 10 + 8 * predictor_x + rannor(1) * predictor_x * 2.2;
      output;
   end;
run;
proc sgplot data=qtrreg;
   reg x=predictor_x y=target_y;
run;
proc reg data=qtrreg;
   model target_y=predictor_x;
run;

 

The graph shows the relationship between predictor_x and target_y. You can also see that the variance appears to be increasing as the value of predictor_x increases.

 

01_daober-QuantReg_1.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

We still fit the linear regression model to this data, but the residual plot shows the increasing variance.

 

02_daober-QuantReg_2.png

 

03_daober-QuantReg_3-1.png

 

If we wanted to predict the 10th quantile for when predictor_x is 4, the regression line would predict the mean. We would then need to use the assumption of a normal distribution to estimate the 10th quantile. Unfortunately, since the variance is changing, the regression assumption of a constant variance is violated.

 

There are several different ways to handle this nonconstant variance. One of the most common approaches is to transform the target variable. But since we are wanting to predict a specific quantile, we would need to feel comfortable with the normal distribution assumption on the transformed data, obtain the prediction for the mean, determine the quantile estimate, and transform the predicted quantile back into the original units. A different approach is to use quantile regression.

 

To illustrate how to perform quantile regression, we will estimate three quantiles: the 10th, 50th, and 90th. We will use PROC QUANTREG. If you wish to use a SAS Viya procedure, it would be PROC QTRSELECT. The syntax for our example is this:

 

proc quantreg data=qtrreg;
   model target_y=predictor_x / quantile=0.1  0.5  0.9;
run;

 

By specifying three quantiles, we actually are fitting three different models. As a result, we have three sets of parameter estimates.

 

05_daober-QuantReg_4.png

 

06_daober-QuantReg_5.png

 

07_daober-QuantReg_6-1.png

 

With three models there are three different intercepts and three different slope estimates. An advantage of quantile regression is that we do not need to make any distributional assumptions. A model is fit for each quantile, regardless of the distribution. Another advantage of quantile regression is that each model can have a different slope. You can see this on the graph.

 

08_daober-QuantReg_7.png

 

As you can see, quantile regression does have advantages. But there are more advantages than just handling the heteroscedasticity or nonconstant variance. Quantile regression makes no distributional assumptions, is robust to outliers in the Y-space, and can provide a more complete picture of the predictor effects since a separate model is fit for each quantile.

 

But like any statistical procedure, there are some drawbacks. Since you are estimating quantiles, you need to have sufficient data to estimate those quantiles. In our example, we had 50 observations. What if we wanted the first quantile? There is not sufficient data to estimate to that level of precision. You can get a sense of this from the graph of our example. The extreme quantile (10 and 90) models both predict the left-most data point perfectly. Also notice that the median line is NOT between the 10th and 90th percentiles. There is not sufficient data at low values of X to adequately model that area.

 

Another drawback is that quantile regression, while robust to outliers in the Y-space, it is not robust to outliers in the x-space. Those points have high leverage. And finally, as you might expect, quantile regression is computationally intensive. Hopefully you can see that quantile regression is quite easy to perform, and has several advantages. As long as you can avoid the disadvantages, quantile regression can be a very useful tool to have in your arsenal of analysis techniques.

 

 

Find more articles from SAS Global Enablement and Learning here.

Contributors
Version history
Last update:
‎03-10-2025 07:50 PM
Updated by:

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags