Here is an idea for a kernel trick that might come in handy later at some point. It seems best to transfer it to this blog, instead of simply keeping it in paper, so that at a later time I can do a quick search. Here is the summary of the trick (apologies if the notation is confusing).

Suppose we have some data $x_1,x_2,\ldots,x_n$. Assume $\Psi \in \mathbb{R}^{D \times D}$. We want to write a kernel version for the form

\[ \Phi(a)^T \Psi \Phi(b) \] where $\Phi$ is a non-linear feature map for an arbitrary data point $a$.

Let $\Psi = \sum_{ij} \alpha_{ij} \Phi(x_i)\Phi(x_j)^T$. The kernel trick is as follows:

\[

\begin{align}

\Phi(a)^T \Psi \Phi(b) &= \Phi(a)^T \sum_{ij} \alpha_{ij} \Phi(x_i)\Phi(x_j)^T \Phi(b)\\

&= \sum_{ij} \alpha_{ij} \Phi(a)^T \Phi(x_i)\Phi(x_j)^T \Phi(b)\\

&= \sum_{ij} \alpha_{ij} K(a,x_i)K(x_j,b)

\end{align}

\]

This is part of a derivation to kernelize an optimization formulation. There is also an assumption here about the $\Psi$, that is, it lies in the span of our data.

## No comments:

## Post a Comment