08-15-2015 10:18 AM
PROC QUANTREG is not robust to high leverage points. Running:
proc quantreg data = sashelp.baseball plots = all;
model salary = YrMajor /quantile = (0.1, 0.5, 0.9);
Gives a warning in the log that there are high leverage points and suggests using the WEIGHT statement. However, no details are available. I have seen some articles suggesting weighting by the inverse of leverage, but there isn't a lot of material.
How should this be done?
08-17-2015 08:25 AM
I poked at this a little bit, using the inverse of the robust distance as a weight. It appeared to reduce the number of outliers, but... The log says the leverage option still uses the unweighted values to generate the next step. I was hoping that some sort of iterative process could be applied, but no luck.
08-17-2015 11:44 AM
Interesting! But that's surely a problem?
I know that QUANTREG is sensitive to high leverage points. But the extent of the sensitivity (and whether it is problematic) surely doesn't depend on whether one requests particular plots!
08-17-2015 12:54 PM
SAS is efficient. If a statistic is not needed by an analysis, it is usually not computed. If you use the LEVERAGE option on the MODEL statement, you get the warning. If you specify PLOT=RDPLOT (which is part of PLOT=ALL), the proc needs to compute the leverage values, so it helps you out by specifying the option for you.
08-17-2015 01:07 PM
But why when a WEIGHT value is specified does the leverage estimation not change? To answer Peter's question, I would think that if you correctly specified weights for the observations then the leverage of given points with small weights would go down. I was hoping to set up a sort of iterative process that minimized the number of leverage points, but it doesn't look like the output dataset from a weighted analysis looks any different than an unweighted analysis. I suppose that is the result of the optimization method being used.
08-17-2015 02:23 PM
I'll be the first to agree that the word "weight" is used in a confusing fashion in the doc for the ROBUSTREG and QUANTREG docs. To me, it seems like the word is used in at least three different ways.
I think you need to look at the ROBUSTREG documentation, not the QUANTREG doc, to see how the WEIGHT statement is used to detect leverage points and outliers. The short answer is that the WEIGHT variable is ignored for that computation. Notice that the robust distances are documented as not being affected by the WEIGHT statement. Notice also that in ROBUSTREG, the "final weighted least squares estimates" refer to "the least squares estimates after the detected outliers are deleted." In other words, the OUTLIER indicator variable (0/1) is used as a weight to exclude observations.
Based on what I've just said, I don't really understand the warning message. You might want to take this up with Tech Support unless Steve has additional ideas. Maybe you are supposed to run the PROC once to get the leverage values (and the warning) and then you use that 0/1 variable to exclude the high-leverage points?:
proc quantreg data = sashelp.baseball ;
model logsalary = YrMajor /quantile=(0.1, 0.5, 0.9) leverage;
output out=out leverage=leverage;
proc quantreg data = out;
weight leverage; /* exclude hihg-leverage from first run */
model logsalary = YrMajor /quantile=(0.1, 0.5, 0.9);
Blah, blah, I'm not an expert, blah, blah, I don't speak for SAS, etc.