In SAS PROC ROBUSTREG you can set K1, which affects the efficiency of the procedure. But I didn't see anything in the documentation about exactly what "efficiency" means nor about the advantages of changing K1 from its default value.
Any insights would be appreciated.
1. K1 does not affect the efficiency of the procedure, it affects the efficiency of the estimator,
We know that under the usual assumptions of linear regression that the least squares estimates of the betas are BLUE. The ROBUSTREG doc seems to be saying that the efficiency of the M estimator is a certain percentage of the OLS estimates when the scaling parameter k1 is properly chosen. In other words, the M estimates for the betas have more variance (they have to), but not too much more.
2. The k1 parameter simply scales the function used to penalize large residuals. For OLS, the penalty function is the quadratic function and we try to minimize the sum of the SQUARES of the residuals. For M estimation, we replace the quadratic function with a different function that caps the weights given to extreme residuals. The Tukey and Yohai functions are two choices. You minimize the sum of the "Tukey function" (or "Yohai function") of the residuals. The following graph compares the Tukey and Yohai functions to the quadratic function. For large residuals (large values of s), the penalty from Tukey or Yohai is much less than for the quadratic function that OLS uses.
data Rho;
b0 = 1.792; b1 = -0.972; b2 = 0.432; b3 = -0.052; b4 = 0.002;
do s = -5 to 5 by 0.1;
k1 = 3.440;
t = s / k1;
if abs(s) <= k1 then
Tukey = 3*t**2 - 3*t**4 + t**6;
else Tukey=1;
k1 = 0.868;
t = s / k1;
if abs(s) <= 2*k1 then
Yohai = s**2/2;
else if 2*k1 < abs(s) and abs(s) <= 3*k1 then
Yohai = k1**2 * (b0+b1*t**2 + b2*t**4 + b3*t**6 + b4*t**8);
else Yohai = 3.25*k1**2;
Quadratic = s**2;
if Quadratic > 3 then Quadratic=.; /* cap the height of the quadratic function */
output;
end;
run;
proc sgplot data=rho;
series x=s y=Tukey / curvelabel;
series x=s y=Yohai / curvelabel;
series x=s y=Quadratic / curvelabel;
xaxis label="Size of Residual";
yaxis label="Weight Given to Penalty Function";
run;
1. K1 does not affect the efficiency of the procedure, it affects the efficiency of the estimator,
We know that under the usual assumptions of linear regression that the least squares estimates of the betas are BLUE. The ROBUSTREG doc seems to be saying that the efficiency of the M estimator is a certain percentage of the OLS estimates when the scaling parameter k1 is properly chosen. In other words, the M estimates for the betas have more variance (they have to), but not too much more.
2. The k1 parameter simply scales the function used to penalize large residuals. For OLS, the penalty function is the quadratic function and we try to minimize the sum of the SQUARES of the residuals. For M estimation, we replace the quadratic function with a different function that caps the weights given to extreme residuals. The Tukey and Yohai functions are two choices. You minimize the sum of the "Tukey function" (or "Yohai function") of the residuals. The following graph compares the Tukey and Yohai functions to the quadratic function. For large residuals (large values of s), the penalty from Tukey or Yohai is much less than for the quadratic function that OLS uses.
data Rho;
b0 = 1.792; b1 = -0.972; b2 = 0.432; b3 = -0.052; b4 = 0.002;
do s = -5 to 5 by 0.1;
k1 = 3.440;
t = s / k1;
if abs(s) <= k1 then
Tukey = 3*t**2 - 3*t**4 + t**6;
else Tukey=1;
k1 = 0.868;
t = s / k1;
if abs(s) <= 2*k1 then
Yohai = s**2/2;
else if 2*k1 < abs(s) and abs(s) <= 3*k1 then
Yohai = k1**2 * (b0+b1*t**2 + b2*t**4 + b3*t**6 + b4*t**8);
else Yohai = 3.25*k1**2;
Quadratic = s**2;
if Quadratic > 3 then Quadratic=.; /* cap the height of the quadratic function */
output;
end;
run;
proc sgplot data=rho;
series x=s y=Tukey / curvelabel;
series x=s y=Yohai / curvelabel;
series x=s y=Quadratic / curvelabel;
xaxis label="Size of Residual";
yaxis label="Weight Given to Penalty Function";
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.