BookmarkSubscribeRSS Feed
BruceBrad
Lapis Lazuli | Level 10

I have a dependent achievement score variable which is continuous, but is also of uneven density, with small areas of high density followed by areas of low density. If I do a quantile regression with a single binary RHS variable, I find that there might be a significant gap (ie coefficient) at the 10th percentile (say), followed by no gap at the 15th, then a significant gap at the 20th and so on. These variations come from the uneven density. The heaping of the density is (probably) not particularly meaningful, but merely represents the limitations in the way the dependent variable is constructed. Is there a way to smooth the quantile regression results? For example, to fit a cubic function of percentile across the various quantiles?

7 REPLIES 7
SteveDenham
Jade | Level 19

I am having a hard time visualizing what might be needed here.  It appears that you want percentile as both an independent and a dependent variable, so I am sure I am misinterpreting what you are asking.  While QUANTREG can fit splines as a basis on the independent variables, I don't think you want that but I could well be wrong.  So, what are the independent variables of interest, and perhaps we can attack from that angle.

Steve Denham

BruceBrad
Lapis Lazuli | Level 10

The independent variable is a binary variable 'income' (1='poor' 0=otherwise). The dependent variable is an achievement score. I want to calculate the impact of being poor on the quantiles of achievement. Because of clumping in the achievement score, I find that the effect is quite variable. For example, the 10th and 20th percentiles of achievement of poor children are below that of the 10th and 20th percentiles for the non-poor, but the 15th percentiles are equal. I would like to smooth the results across the quantiles.

SteveDenham
Jade | Level 19

In order to smooth things, you will need some sort of additional independent variable that distinguishes the parts of the percentile curve--a step function might be the first to try.  The other thing to look at is heteroscedasticity in your simple model--for that QUANTREG has an option.

But what I think is the real problem is that quantile regression really, really doesn't like categorical variables as the only independent variable, for just the reasons that you are finding.  There might be a better way, using TRANSREG where you can transform both sides, but that would not give you the quantiles you are looking for.

Steve Denham

Rick_SAS
SAS Super FREQ

My 2 cents: By using QUANTREG you are already smoothing the data, which is why you sometimes see the line connecting to a location where there is no data.

As you know, there is a big difference between quantile regression and connecting the sample quantiles for income=0 and income=1. See

Quantile regression: Better than connecting the sample quantiles of binned data - The DO Loop

For quantile regression, the procedure gives predicted values for the quantiles, conditioned on income.  Just as in ordinary least squares regression, the predicted value might not fall near any observed data point.

BruceBrad
Lapis Lazuli | Level 10

Rick, I think you are wrong here. Your linked post is all about the case where the explanatory variable is continuous. Here it is discrete. In this case, I think quantile regression _is_ the same as collecting the sample quantiles. I'll try and mock up some example data soon and come back to this.

Rick_SAS
SAS Super FREQ

Upon re-reading your description of the problem, I'm not sure why I linked to that post. As you say, you have a discrete regressor, whereas my blog post discusses a continuous regressor.  Sorry for the confusion.

SteveDenham
Jade | Level 19

What happens with the following:

proc quantreg data=yourdata;

class poor;

model response=poor/

     quantile = 0.05 to 0.95 by 0.05;

     plot=quantplot;

run;

Not saying it is interpretable, but it seems like the code should run without throwing errors or warnings.

Steve Denham

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2313 views
  • 0 likes
  • 3 in conversation