SlutskyFan:
The Gaussian kernel that you mentioned fits in the scheme set by David thus:
First, rewrite David's pseudo-code as follows:
X[1,.],..., X[n,.] - n elements of a (Hilbert) space with inner product <,>
K - matrix of size nxn
for i=1 to n
for j=1 to n
K[i,i] = % a linear dot product
endfor
endfor
To make this work for a sample of size n of m features: c_1, c_2, ..., c_n, apply the pseudo code to the result of transforming the feature vectors according to a function f, i.e. apply the pseudo-code to X[1,] = f(c_1) , X[2,] = f(c_2), ... , X[n,]=f(c_n).
Here is the function: let c be an m dimensional vector. To c we will assign an element in an infinite dimensional space, a space of functions defined in m dimensional vectors. f(c) is a function of another variable h defined as:
f(c)(h) = exp( - (||c - h||^2)/(2*sigma^2) )
Here is the definition of the inner product (all technicalities aside):
if A and B are (sufficiently nice) functions of h in R^m:
= integral over R^m of A(h)B(h) dh
Now, it is a long exercise to verify that:
= exp( - (||x - c||^2)/(sigma^2) )
- it is the same computation that verifies that the sum of two independent normally distributed random variables is also normal -
In other words:
= = exp( - (||c_1 - c_2||^2)/(sigma^2) )
Different functions f give rise to different kernels. Different mappings of the feature vectors into larger spaces give rise to different point configurations.
... View more