topic Re: Kernals in Support Vector Machines in SAS Data Science

Kernals in Support Vector Machines

SlutskyFan — Wed, 10 Nov 2010 19:23:52 GMT

This is more theoretical than SAS related:

I continuously see kernel functions in SVMs as k(xi,xj), or k(x,y) or
k(x,z)

Are kernel functions in SVMs functions (dot products) of independent
variables (x's) or a function of independent and dependent
variables(x,y)?

This also confuses my understanding of the relationship between kernel
functions and kernel matrices.

Any help?

I currently have SAS EM 6.2. As far as I know SVMs aren't possible with this version. Any news on their implementation in future releases?

-thanks

Re: Kernals in Support Vector Machines

WayneThompson — Wed, 10 Nov 2010 22:31:04 GMT

An experimenta linear and nonlinear kernel SVM node is planned for EM 7.1 SAS9.3 mid next year. Thanks for you use of the software. There are quite a few other classification and prediction tools you will want to try. Some users have reported good Gradient Boosting results. Will keep you and the forum up to date.

Re: Kernals in Support Vector Machines

DavidR_Hardoon — Tue, 16 Nov 2010 03:53:36 GMT

Hi Slutsky,

To answer your query, the various notations you come across in the kernel literature is dependent on the user group. From a notation perspective x, y (or z) refer to vectors where as xi and xj refers to element i and j within vector x.

A kernel matrix is usually donated with capital K whereas k is the kernel function (dot product). Hence, where as K is a matrix k(x,y) will be only an entry scalar in matrix K.

To answer your second question regarding dependence and independence. The assumption is that the data is iid (identically and identically distributed).

Perhaps this pseudocode will help understand the notation
X - matrix of size nxm (n sample and m features)
K - matrix of size nxn

for i=1 to n
for j=1 to n
K[i,i] = X[i,:]*X[j,:] % a linear dot product
endfor
endfor

a good book is 'introduction to support vector machine' by cristinanini and shawe-taylor

Re: Kernals in Support Vector Machines

SlutskyFan — Wed, 17 Nov 2010 16:07:10 GMT

Thanks, this really helps. I think I'm getting a better picture. You said:

"where as K is a matrix k(x,y) will be only an entry scalar in matrix K."

I think I understand, but based on what you said is my following interpretation correct?

#1 k(x,y) is a kernal function (which produces a scaler that becomes and entry in matrix K)

If that is true, then does k(x,y) produce a dot product only between xi and xj (elements of the matrix X) or are they dot products also between x and y?

I'm thinking the entries in the kernal matrix are only dot products of xi and xj given your pseudocode , and y is just a 'label'.

But, I've also seen 'kernal functions' depicted in 2 different ways:

gaussian kernal: k(x,y) = exp(-||x-y||^2 / sigma^2)

gaussian kernal: k(xi,xj) = exp(-||xi-xj||^2 / sigma^2)

So I'm still confused on the notation about what are the 'inputs' into the kernel function, are the elements only of some matrix X, or can they also contain elements of Y?

Thanks.

Re: Kernals in Support Vector Machines

SlutskyFan — Wed, 17 Nov 2010 16:14:55 GMT

Thanks so much. Looking forward to it. I've actually got my hands full dealing with all that is available in EM anyway- but I like to stay ahead of the curve.

Re: Kernals in Support Vector Machines

VictorZurkowski — Wed, 16 Feb 2011 20:34:30 GMT

SlutskyFan:
The Gaussian kernel that you mentioned fits in the scheme set by David thus:

First, rewrite David's pseudo-code as follows:
X[1,.],..., X[n,.] - n elements of a (Hilbert) space with inner product <,>
K - matrix of size nxn

for i=1 to n
for j=1 to n
K[i,i] = % a linear dot product
endfor
endfor

To make this work for a sample of size n of m features: c_1, c_2, ..., c_n, apply the pseudo code to the result of transforming the feature vectors according to a function f, i.e. apply the pseudo-code to X[1,] = f(c_1) , X[2,] = f(c_2), ... , X[n,]=f(c_n).

Here is the function: let c be an m dimensional vector. To c we will assign an element in an infinite dimensional space, a space of functions defined in m dimensional vectors. f(c) is a function of another variable h defined as:
f(c)(h) = exp( - (||c - h||^2)/(2*sigma^2) )

Here is the definition of the inner product (all technicalities aside):
if A and B are (sufficiently nice) functions of h in R^m:
= integral over R^m of A(h)B(h) dh

Now, it is a long exercise to verify that:
= exp( - (||x - c||^2)/(sigma^2) )
- it is the same computation that verifies that the sum of two independent normally distributed random variables is also normal -

In other words:
= = exp( - (||c_1 - c_2||^2)/(sigma^2) )

Different functions f give rise to different kernels. Different mappings of the feature vectors into larger spaces give rise to different point configurations.