Solved: Re: Estimating parameters using a kernel regression approach

Alain38 · Posted 08-14-2018 10:53 AM

Hello,

I would like to reproduce the computations in Ang & Kristensen (2012) "Testing conditional factor models" (enclosed, eq. 10) and I'm totally stuck as soon as kernels come into play.

Let's consider the following regression, how can I find the estimators while using kernel-weighted least squares?

With OLS, it's simple :

proc IML;
    use sashelp.class;
    read all var {weight} into Y;
    read all var {height} into X;

    X =  (J(nrow(X),1,1) || X)`;
    Y = Y`;
    coeff_OLS = inv(X * X`) * (X * Y`);
quit;

But how can I find them using for example a kernel with Gaussian density and a bandwidth of 0.4?

/*    coeff_Kernel = K(inv(X * X`)) * K((X * Y));
    with 
       - f_h(x) = 1/(nrow(X)*h) * sum(K(x - xi)/h);
       - K(x) = (2 * constant("pi"))**(-1/2) * exp(-(x)**2/2);
       - h = 0.4;    */

Should I use proc LOESS for example to output the results, instead of trying to directly make tedious computations in IML?

Ideally I would like to stick to IML, but any hint is most welcome!

Alain38 · Posted 08-24-2018 10:22 AM

Ok so thanks to Bruce Hansen's book, "Econometrics", I think I finally understood (and realized I have indeed been mistaken in my previous posts, notably with the distinction fixed/variable bandwidth)

I put the code below if anybody is interested. It computes a local linear approximation. The kernel is gaussian and the bandwidth fixed.

I would appreciate if someone more expert than I am could have a quick look and tell me if it seems fine or not!

data Sample;
	set sashelp.class (rename=(weight=y height=x));
run;

Proc sort data=Sample;
	by x;
run;

proc iml;
	use Sample;
		read all var {y} into Y;
		read all var {x} into X;
	close Sample;
	n = countn(X);
	h = 5; *Arbitrarily defined there, high value because of the sparseness of the data pts in this sample;

/* 1st window */
	ld = X[1,] - h; *limit down;
	lu = X[1,] + h; * limit up;
	window = loc(X<=lu & X>=ld);
	max = max(window);
	min = min(window);
	Xi = X[min:max,];
	Yi = Y[min:max,];

	di = Xi - X[1,];
	Xi_norm = di / h;
	W = (2 * constant("pi"))##(-1/2) * exp(-1/2*Xi_Norm##2);

	Xw = (sqrt(W) # (J(nrow(Xi),1,1) || Xi))`;
	Yw = (sqrt(W)#Yi)`;
	coeff = inv(Xw * Xw`) * (Xw * Yw`);
	Y_fit = coeff[(1),] + coeff[(2),] * X[1,];
	create Y_fit from Y_fit;
	append from Y_fit;

/* Same goes for the following points */
do i = 2 to n;
	ld = X[i,] - h;
	lu = X[i,] + h;
	window = loc(X<=lu & X>=ld);
	max = max(window);
	min = min(window);
	Xi = X[min:max,];
	Yi = Y[min:max,];
	di = Xi - X[i,];
	Xi_norm = di / h;
	W = (2 * constant("pi"))##(-1/2) * exp(-1/2*Xi_Norm##2);
	Xw = (sqrt(W) # (J(nrow(Xi),1,1) || Xi))`;
	Yw = (sqrt(W)#Yi)`;
	coeff = inv(Xw * Xw`) * (Xw * Yw`);
	Y_fit = coeff[(1),] + coeff[(2),] * X[i,];
	append from Y_fit;
end;
	close Y_fit;
quit;

data Sample;
	merge Sample Y_fit (rename=(COL1 = Y_fit));
run;

proc sgplot data=Sample;
	scatter x=x y=y;
	series x=x y=Y_fit /lineattrs=(color=red);
run;

View solution in original post

Rick_SAS · Posted 08-14-2018 11:11 AM

1. Read about how to compute weighted regression. It's just like OLS except that you replace X with sqrt(W)*X and Y with sqrt(W)*Y, where W is the weight matrix.

2. I haven't read the article, but usually "kernel regression" refers to using a fixed-width kernel whereas loess refers to using a variable-width kernel.

3. Is this 1-D regression? If you are doing a fixed-width kernel regression, the predicted value at x0 is computed by using the value h = X - x0 in the kernel function to obtain the weights. Repeat this for a sequence of x0 values.

4. If you want to do loess regression (which is harder because of the variable-width kernels, see "Loess regression in SAS/IML" and the articles (linked in the article) that show how to compute nearest neighbors.

Hopefully, these hints will get you started. The fixed-width kernel regression is straightforward, but there are details that you have to work out in your mind. I suggest first writing code that uses kernel regression to predict a value at a fixed x0. You can then iterate over a sequence of x0 values.

Alain38 · Posted 08-14-2018 11:48 AM

Thank you Rick for your valuable help as usual!

1. I'm going to read and practice thanks to that.

2. I'm not sure whether this is a fixed or variable-width kernel, I will try to find it out. In this article they propose a plug-in method to compute the bandwidths but I did not reach this point in the computations yet. Thank you for pointing this out.

I have now something to work on, I will come back when I will see thing a little more clearly!

Alain38 · Posted 08-20-2018 04:07 AM

So the bandwidth is in my case fixed across the full sample and whether in this example this is a 1-D regression, I will use the code for higher dimensions later.

Thanks to your blog Rick I wrote the following code which compute the predicted value for the 1st point:

proc iml;
	use sashelp.class;
		read all var {weight} into Y;
		read all var {height} into X;
	close sashelp.class;

/******************* 1) Choose a smoothing parameter *******************/
	n = nrow(Y); *Sample size;
	h = 0.25; *Bandwidth defined arbitrarily for now;
	k = floor(n*h); *Nb of nearest points to x0;

/**************** 2) Find the k nearest neighbors to x0 ****************/
/* Module to compute indices (row numbers) of k nearest neighbors.     */
/*   INPUT:  X    an (N x p) data matrix			       */
/*           k    specifies the number of nearest neighbors (k>=1)     */
/*   OUTPUT: idx  an (N x k) matrix of row numbers. idx[,j] contains   */
/*               the row numbers (in X) of the j_th closest neighbors  */
/*          dist an (N x k) matrix. dist[,j] contains the distances    */
/*                between X and the j_th closest neighbors             */
start NearestNbr(idx, dist, X, k=1);
   N = nrow(X);  p = ncol(X);
   idx = j(N, k, .);            /* j_th col contains obs numbers for j_th closest obs */
   dist = j(N, k, .);           /* j_th col contains distance for j_th closest obs */
   D = distance(X);				/* N x N distance matrix */
   D[do(1, N*N, N+1)] = .;      /* set diagonal elements to missing */
   do j = 1 to k;
      dist[,j] = D[ ,><];       /* smallest distance in each row */
      idx[,j] = D[ ,>:<];       /* column of smallest distance in each row */
      if j < k then do;         /* prepare for next closest neighbors */
         ndx = sub2ndx(dimension(D), T(1:N)||idx[,j]);
         D[ndx] = .;            /* set elements to missing */
      end;      
   end;
finish;

	run NearestNbr(nbrIdx, dist, X, k); /* Call the module */

/* Corresponding X for k = 4 */
	Near_X1 = X[nbrIdx[,1], ];  /* 1st nearest neighbors */
	Near_X2 = X[nbrIdx[,2], ];  /* 2nd nearest neighbors */
	Near_X3 = X[nbrIdx[,3], ];  /* 3rd nearest neighbors */
	Near_X4 = X[nbrIdx[,4], ];  /* 4th nearest neighbors */
	Near_X = X || Near_X1 || Near_X2 || Near_X3 || Near_X4;
i = 1; /*do i = 1 to n later */
	Xi = Near_X[i,]`;

/* Corresponding Y for k = 4 */
	Near_Y1 = Y[nbrIdx[,1], ];
	Near_Y2 = Y[nbrIdx[,2], ];
	Near_Y3 = Y[nbrIdx[,3], ];
	Near_Y4 = Y[nbrIdx[,4], ];
	Near_Y = Y || Near_Y1 || Near_Y2 || Near_Y3 || Near_Y4;
	Yi = Near_Y[i,]`;

/* Normalize the x-axis for each window					              */
/* Substract Xi value to center around it, then divide by the largest distance in the */
/* neighborhood to reduce. Xi value is then 0 and the furthest value equals 1.	      */
	di = (Xi - Xi[1,]); *Distance from Xi;
	D = max(abs(di));
	Xi_norm = di / D;

/************* 3) Assign weights to the nearest neighbors **************/
	Wi = (2 * constant("pi"))##(-1/2) * exp(-1/2*Xi_Norm##2); *Gaussian kernel;

/**************** 4) Perform local weighted regression *****************/
	Xw = (sqrt(Wi) # (J(nrow(Xi),1,1) || Xi))`;
	Yw = (sqrt(Wi)#Yi)`;
	coeff = inv(Xw * Xw`) * (Xw * Yw`);
	X_fit = coeff[1,] + coeff[2,] * Xi[1,]; *Fitted value;
/* unweighted: a = -241.87 b = 5.36 ; weighted: a = -206.19 b = 4.83 */
/* Slope of the weighted regression line is smaller than the slope of the unweighted line so looks fine */
quit;

No matter how ugly and unoptimized is this first draft, I hope it delivers the correct value (please let me know if something looks suspicious). So let's suppose that I improve it and create a loop to run it for every point.

If it's a bit more clear now how it works, I'm afraid I'm still confused and not having the full picture yet. So, in the paper I must perform:
coeff_Kernel = K(inv(X * X`)) * K((X * Y))
As I'm not familiar with kernel notation, is that what I have done? And how do I perform for example K(Z * Z`), Z being a (n*m) matrix?

And concerning the results of this example, the difference between OLS and kernel-weighted LS is that I will have 1 set of coefficients in the first case, and N = 19 sets of coeff in the second one. Is this statement correct or am I missing something?

Rick_SAS · Posted 08-20-2018 06:39 AM

A "fixed bandwidth" means that at each point x0 you use the distance from the data points to x0 to determine the weights for the regression at that point. You do NOT use nearest neighbors, as you are attempting to do in your code. Nearest neighbors are for the varying bandwidth case.

For the kernel computations, look at the article "How to visualize a kernel density estimate." The topic of that article is kernel density estimation instead of kernel regression, but maybe some of the ideas will be helpful.

Alain38 · Posted 08-20-2018 07:21 AM

Ah ok sorry, I got confused with the wording.

The idea - if I understood correctly what I have to do to follow this paper - is that the width of the window is constant (4 nearest points of the focal point in this basic example) but the weights are Gaussian, i.e. the closer the point is compared to the focal point, the more it matters, so the more weights it gets. Note that I smooth only compared to the x-axis (no second step where I smooth compared to the y-axis). Therefore I hope this code is correct.

Could you please enlighten me about the notations and the general idea so I know if I am on the right path to solve for example K(X * Y) or if I am completely mistaken?

Rick_SAS · Posted 08-20-2018 08:50 AM

Unfortunately, I don't have the time to read the paper and discuss the notation and general ideas. However, if you have specific coding questions, feel free to post. Good luck on your project.

Alain38 · Posted 08-20-2018 09:02 AM

Sure, I understand. Thank you. I will keep working based on the links you provided me and hope I will be able to handle it in the end!

Alain38 · Posted 08-24-2018 10:22 AM

Ok so thanks to Bruce Hansen's book, "Econometrics", I think I finally understood (and realized I have indeed been mistaken in my previous posts, notably with the distinction fixed/variable bandwidth)

I put the code below if anybody is interested. It computes a local linear approximation. The kernel is gaussian and the bandwidth fixed.

I would appreciate if someone more expert than I am could have a quick look and tell me if it seems fine or not!

data Sample;
	set sashelp.class (rename=(weight=y height=x));
run;

Proc sort data=Sample;
	by x;
run;

proc iml;
	use Sample;
		read all var {y} into Y;
		read all var {x} into X;
	close Sample;
	n = countn(X);
	h = 5; *Arbitrarily defined there, high value because of the sparseness of the data pts in this sample;

/* 1st window */
	ld = X[1,] - h; *limit down;
	lu = X[1,] + h; * limit up;
	window = loc(X<=lu & X>=ld);
	max = max(window);
	min = min(window);
	Xi = X[min:max,];
	Yi = Y[min:max,];

	di = Xi - X[1,];
	Xi_norm = di / h;
	W = (2 * constant("pi"))##(-1/2) * exp(-1/2*Xi_Norm##2);

	Xw = (sqrt(W) # (J(nrow(Xi),1,1) || Xi))`;
	Yw = (sqrt(W)#Yi)`;
	coeff = inv(Xw * Xw`) * (Xw * Yw`);
	Y_fit = coeff[(1),] + coeff[(2),] * X[1,];
	create Y_fit from Y_fit;
	append from Y_fit;

/* Same goes for the following points */
do i = 2 to n;
	ld = X[i,] - h;
	lu = X[i,] + h;
	window = loc(X<=lu & X>=ld);
	max = max(window);
	min = min(window);
	Xi = X[min:max,];
	Yi = Y[min:max,];
	di = Xi - X[i,];
	Xi_norm = di / h;
	W = (2 * constant("pi"))##(-1/2) * exp(-1/2*Xi_Norm##2);
	Xw = (sqrt(W) # (J(nrow(Xi),1,1) || Xi))`;
	Yw = (sqrt(W)#Yi)`;
	coeff = inv(Xw * Xw`) * (Xw * Yw`);
	Y_fit = coeff[(1),] + coeff[(2),] * X[i,];
	append from Y_fit;
end;
	close Y_fit;
quit;

data Sample;
	merge Sample Y_fit (rename=(COL1 = Y_fit));
run;

proc sgplot data=Sample;
	scatter x=x y=y;
	series x=x y=Y_fit /lineattrs=(color=red);
run;

Rick_SAS · Posted 08-24-2018 11:16 AM

Thanks for the update. A quick scan indicates that you have the right idea.

There are some improvements that you can make regarding vectorizing the code. I am actually planning a new blog post on this topic, to be published next week. The SAS code is written, but I haven't finished the text. I will post back next week (probably Wednesday) with a link.

Alain38 · Posted 08-24-2018 11:27 AM

Thank you Rick for taking the time to look at it! Glad there is no longer fundamental mistakes.

I will work on improving this code now (notably make it work for higher dimensions) and am looking forward to reading your next blog post then!

Rick_SAS · Posted 08-30-2018 04:06 PM

Blog post on implementing fixed-bandwidth kernel regression in SAS/IML.

Ready to join fellow brilliant minds for the SAS Hackathon?