Processing math: 100%

Sunday, May 1, 2011

Kernel trick: feature, matrix, feature

Here is an idea for a kernel trick that might come in handy later at some point. It seems best to transfer it to this blog, instead of simply keeping it in paper, so that at a later time I can do a quick search. Here is the summary of the trick (apologies if the notation is confusing).

Suppose we have some data x_1,x_2,\ldots,x_n. Assume \Psi \in \mathbb{R}^{D \times D}. We want to write a kernel version for the form
\Phi(a)^T \Psi \Phi(b) where \Phi is a non-linear feature map for an arbitrary data point a.
Let \Psi = \sum_{ij} \alpha_{ij} \Phi(x_i)\Phi(x_j)^T. The kernel trick is as follows:
\begin{align} \Phi(a)^T \Psi \Phi(b) &= \Phi(a)^T \sum_{ij} \alpha_{ij} \Phi(x_i)\Phi(x_j)^T \Phi(b)\\ &= \sum_{ij} \alpha_{ij} \Phi(a)^T \Phi(x_i)\Phi(x_j)^T \Phi(b)\\ &= \sum_{ij} \alpha_{ij} K(a,x_i)K(x_j,b) \end{align}
This is part of a derivation to kernelize an optimization formulation. There is also an assumption here about the \Psi, that is, it lies in the span of our data.

No comments:

Post a Comment