So the bandwidth is in my case fixed across the full sample and whether in this example this is a 1-D regression, I will use the code for higher dimensions later. Thanks to your blog Rick I wrote the following code which compute the predicted value for the 1st point: proc iml;
use sashelp.class;
read all var {weight} into Y;
read all var {height} into X;
close sashelp.class;
/******************* 1) Choose a smoothing parameter *******************/
n = nrow(Y); *Sample size;
h = 0.25; *Bandwidth defined arbitrarily for now;
k = floor(n*h); *Nb of nearest points to x0;
/**************** 2) Find the k nearest neighbors to x0 ****************/
/* Module to compute indices (row numbers) of k nearest neighbors. */
/* INPUT: X an (N x p) data matrix */
/* k specifies the number of nearest neighbors (k>=1) */
/* OUTPUT: idx an (N x k) matrix of row numbers. idx[,j] contains */
/* the row numbers (in X) of the j_th closest neighbors */
/* dist an (N x k) matrix. dist[,j] contains the distances */
/* between X and the j_th closest neighbors */
start NearestNbr(idx, dist, X, k=1);
N = nrow(X); p = ncol(X);
idx = j(N, k, .); /* j_th col contains obs numbers for j_th closest obs */
dist = j(N, k, .); /* j_th col contains distance for j_th closest obs */
D = distance(X); /* N x N distance matrix */
D[do(1, N*N, N+1)] = .; /* set diagonal elements to missing */
do j = 1 to k;
dist[,j] = D[ ,><]; /* smallest distance in each row */
idx[,j] = D[ ,>:<]; /* column of smallest distance in each row */
if j < k then do; /* prepare for next closest neighbors */
ndx = sub2ndx(dimension(D), T(1:N)||idx[,j]);
D[ndx] = .; /* set elements to missing */
end;
end;
finish;
run NearestNbr(nbrIdx, dist, X, k); /* Call the module */
/* Corresponding X for k = 4 */
Near_X1 = X[nbrIdx[,1], ]; /* 1st nearest neighbors */
Near_X2 = X[nbrIdx[,2], ]; /* 2nd nearest neighbors */
Near_X3 = X[nbrIdx[,3], ]; /* 3rd nearest neighbors */
Near_X4 = X[nbrIdx[,4], ]; /* 4th nearest neighbors */
Near_X = X || Near_X1 || Near_X2 || Near_X3 || Near_X4;
i = 1; /*do i = 1 to n later */
Xi = Near_X[i,]`;
/* Corresponding Y for k = 4 */
Near_Y1 = Y[nbrIdx[,1], ];
Near_Y2 = Y[nbrIdx[,2], ];
Near_Y3 = Y[nbrIdx[,3], ];
Near_Y4 = Y[nbrIdx[,4], ];
Near_Y = Y || Near_Y1 || Near_Y2 || Near_Y3 || Near_Y4;
Yi = Near_Y[i,]`;
/* Normalize the x-axis for each window */
/* Substract Xi value to center around it, then divide by the largest distance in the */
/* neighborhood to reduce. Xi value is then 0 and the furthest value equals 1. */
di = (Xi - Xi[1,]); *Distance from Xi;
D = max(abs(di));
Xi_norm = di / D;
/************* 3) Assign weights to the nearest neighbors **************/
Wi = (2 * constant("pi"))##(-1/2) * exp(-1/2*Xi_Norm##2); *Gaussian kernel;
/**************** 4) Perform local weighted regression *****************/
Xw = (sqrt(Wi) # (J(nrow(Xi),1,1) || Xi))`;
Yw = (sqrt(Wi)#Yi)`;
coeff = inv(Xw * Xw`) * (Xw * Yw`);
X_fit = coeff[1,] + coeff[2,] * Xi[1,]; *Fitted value;
/* unweighted: a = -241.87 b = 5.36 ; weighted: a = -206.19 b = 4.83 */
/* Slope of the weighted regression line is smaller than the slope of the unweighted line so looks fine */
quit; No matter how ugly and unoptimized is this first draft, I hope it delivers the correct value (please let me know if something looks suspicious). So let's suppose that I improve it and create a loop to run it for every point. If it's a bit more clear now how it works, I'm afraid I'm still confused and not having the full picture yet. So, in the paper I must perform: coeff_Kernel = K(inv(X * X`)) * K((X * Y)) As I'm not familiar with kernel notation, is that what I have done? And how do I perform for example K(Z * Z`), Z being a (n*m) matrix? And concerning the results of this example, the difference between OLS and kernel-weighted LS is that I will have 1 set of coefficients in the first case, and N = 19 sets of coeff in the second one. Is this statement correct or am I missing something?
... View more