BookmarkSubscribeRSS Feed
plf515
Lapis Lazuli | Level 10

PROC QUANTREG is not robust to high leverage points.  Running:

proc quantreg data = sashelp.baseball plots = all;

model salary = YrMajor /quantile = (0.1, 0.5, 0.9);

run;

Gives a warning in the log that there are high leverage points and suggests using the WEIGHT statement.  However, no details are available. I have seen some articles suggesting weighting by the inverse of leverage, but there isn't a lot of material.

How should this be done?

Peter

10 REPLIES 10
SteveDenham
Jade | Level 19

Hi Peter,

I poked at this a little bit, using the inverse of the robust distance as a weight.  It appeared to reduce the number of outliers, but...  The log says the leverage option still uses the unweighted values to generate the next step.  I was hoping that some sort of iterative process could be applied, but no luck.

Steve Denham

plf515
Lapis Lazuli | Level 10

Thanks Steve.  Hmm.  Maybe the developer of QUANTREG will chime in.

Rick_SAS
SAS Super FREQ

Use a log transformation on the response:

model logsalary = YrMajor /quantile = (0.1, 0.5, 0.9);

plf515
Lapis Lazuli | Level 10

Hi Rick

That does not solve the problem. There are still 26 leverage points and the warning is still given.

Peter

Rick_SAS
SAS Super FREQ

The WARNING is common from one of the plots.  If you delete PLOTS=ALL, there is no warning.

plf515
Lapis Lazuli | Level 10

Interesting!  But that's surely a problem?

I know that QUANTREG is sensitive to high leverage points.  But the extent of the sensitivity (and whether it is problematic) surely doesn't depend on whether one requests particular plots!

Rick_SAS
SAS Super FREQ

SAS is efficient. If a statistic is not needed by an analysis, it is usually not computed. If you use the LEVERAGE option on the MODEL statement, you get the warning.  If you specify PLOT=RDPLOT (which is part of PLOT=ALL), the proc needs to compute the leverage values, so it helps you out by specifying the option for you.

SteveDenham
Jade | Level 19

But why when a WEIGHT value is specified does the leverage estimation not change?  To answer Peter's question, I would think that if you correctly specified weights for the observations then the leverage of given points with small weights would go down.  I was hoping to set up a sort of iterative process that minimized the number of leverage points, but it doesn't look like the output dataset from a weighted analysis looks any different than an unweighted analysis.  I suppose that is the result of the optimization method being used.

Steve Denham

Rick_SAS
SAS Super FREQ

I'll be the first to agree that the word "weight" is used in a confusing fashion in the doc for the ROBUSTREG and QUANTREG docs.  To me, it seems like the word is used in at least three different ways.

I think you need to look at the ROBUSTREG documentation, not the QUANTREG doc, to see how the WEIGHT statement is used to detect leverage points and outliers. The short answer is that the WEIGHT variable is ignored for that computation. Notice that the robust distances are documented as not being affected by the WEIGHT statement. Notice also that in ROBUSTREG, the "final weighted least squares estimates" refer to "the least squares estimates after the detected outliers are deleted." In other words, the OUTLIER indicator variable (0/1) is used as a weight to exclude observations.

Based on what I've just said, I don't really understand the warning message. You might want to take this up with Tech Support unless Steve has additional ideas. Maybe you are supposed to run the PROC once to get the leverage values (and the warning) and then you use that 0/1 variable to exclude the high-leverage points?:


proc quantreg data = sashelp.baseball ;
model logsalary = YrMajor /quantile=(0.1, 0.5, 0.9) leverage;
output out=out leverage=leverage;
run;

proc quantreg data = out;
weight leverage;    /* exclude hihg-leverage from first run */
model logsalary = YrMajor /quantile=(0.1, 0.5, 0.9);
run;

Blah, blah, I'm not an expert, blah, blah, I don't speak for SAS, etc.

plf515
Lapis Lazuli | Level 10

Ok, time to talk to tech support.

I'll report back on what they say.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2022 views
  • 6 likes
  • 3 in conversation