03-07-2014 12:10 PM
I have a dependent achievement score variable which is continuous, but is also of uneven density, with small areas of high density followed by areas of low density. If I do a quantile regression with a single binary RHS variable, I find that there might be a significant gap (ie coefficient) at the 10th percentile (say), followed by no gap at the 15th, then a significant gap at the 20th and so on. These variations come from the uneven density. The heaping of the density is (probably) not particularly meaningful, but merely represents the limitations in the way the dependent variable is constructed. Is there a way to smooth the quantile regression results? For example, to fit a cubic function of percentile across the various quantiles?
03-10-2014 02:01 PM
I am having a hard time visualizing what might be needed here. It appears that you want percentile as both an independent and a dependent variable, so I am sure I am misinterpreting what you are asking. While QUANTREG can fit splines as a basis on the independent variables, I don't think you want that but I could well be wrong. So, what are the independent variables of interest, and perhaps we can attack from that angle.
03-10-2014 02:30 PM
The independent variable is a binary variable 'income' (1='poor' 0=otherwise). The dependent variable is an achievement score. I want to calculate the impact of being poor on the quantiles of achievement. Because of clumping in the achievement score, I find that the effect is quite variable. For example, the 10th and 20th percentiles of achievement of poor children are below that of the 10th and 20th percentiles for the non-poor, but the 15th percentiles are equal. I would like to smooth the results across the quantiles.
03-10-2014 02:48 PM
In order to smooth things, you will need some sort of additional independent variable that distinguishes the parts of the percentile curve--a step function might be the first to try. The other thing to look at is heteroscedasticity in your simple model--for that QUANTREG has an option.
But what I think is the real problem is that quantile regression really, really doesn't like categorical variables as the only independent variable, for just the reasons that you are finding. There might be a better way, using TRANSREG where you can transform both sides, but that would not give you the quantiles you are looking for.
03-13-2014 10:22 AM
My 2 cents: By using QUANTREG you are already smoothing the data, which is why you sometimes see the line connecting to a location where there is no data.
As you know, there is a big difference between quantile regression and connecting the sample quantiles for income=0 and income=1. See
For quantile regression, the procedure gives predicted values for the quantiles, conditioned on income. Just as in ordinary least squares regression, the predicted value might not fall near any observed data point.
06-19-2014 10:49 AM
Rick, I think you are wrong here. Your linked post is all about the case where the explanatory variable is continuous. Here it is discrete. In this case, I think quantile regression _is_ the same as collecting the sample quantiles. I'll try and mock up some example data soon and come back to this.
06-19-2014 11:38 AM
Upon re-reading your description of the problem, I'm not sure why I linked to that post. As you say, you have a discrete regressor, whereas my blog post discusses a continuous regressor. Sorry for the confusion.
06-19-2014 01:07 PM
What happens with the following:
proc quantreg data=yourdata;
quantile = 0.05 to 0.95 by 0.05;
Not saying it is interpretable, but it seems like the code should run without throwing errors or warnings.