6

I am learning about unsupervised machine learning, and am a bit confused regarding different algorithms to update weights. So, I understand that both Oja's Rule and BCM can be used.

In Oja's rule:

dw/dt = k*x*y - w*y^2

Where x is the value at the input neuron, y is the value at the output neuron and w is the connection strength between the two. The idea is that this prevents weights from growing out of proportion.

In BCM:

dw/dt = k*(y-theta)*x

Where the idea is that unless my postsynaptic strength exceeds a threshold theta then I don't want my connectio to be strenghtened.

Studying competitive learning, which is yet another type of unsupervised learning I cam across another rule:

dw/dt = n*(x-y)

In this case however x is the full input vector and y is the vector representation of the output vector. The idea being that we move the prototype that responded the strongest to a given input closer to it, making the two more similar.

However, I don't understand when should I use which rule? For example, why couldn't I use a rule that combines both Oja's and BCM, hence only increasing connection weights when the output exceeds a given threshold, and preventing weights from growing out of proportion?

MrD
  • 215
  • 1
  • 8
  • 1
    often questions such as this are studied empirically wrt a particular dataset to see which outperforms the other, it can depend on context... – vzn May 23 '15 at 18:36

1 Answers1

3

Oja's learning rule and the BCM share the underlying generative model, a linear perceptron: $$y = w^T x,$$ where $w$ and $x$ are vectors of the similar dimension.

But they differ in their goals:

Oja's rule (run with a sufficiently small learning rate) extracts the 1st principle component of the covariance matrix of the data: $$C = \langle xx^T \rangle_{p(x)}$$ s.t. $ \lambda_1 w_1 = C w_1$, where $\lambda_1$ is the largest eigenvalue.

The BCM rule maximizes the input selectivity with respect to a specific input pattern. The selectivity is defined as $$ s(w) = 1 - \frac{\langle w^T x^{(i)} \rangle}{ \max_i w^T x^{(i)} }. $$ The selectivity is maximized if the output neuron responds strongly to one of the input patterns (e.g. the kth stimulus $x^{(k)}$) but barely responds to all others.

The third rule doesn't make sense to me (I'll use explicit notation for vectors now). Assuming that $n$ is a scalar (e.g. learning rate), $\vec{x}$, $\vec{y}$ and $\vec{w}$ must have the same dimensions. Then, what's your generative model? Neither can it be $y = \vec{w}^T \vec{x}$ (because $\vec{y}$ is a vector) nor $\vec{y} = W \vec{x}$ because your weights are vector and not a matrix..

Jannes
  • 31
  • 2
  • Would you mind clarifying the equation for input selectivity? is the numerator a vector? is the index i over time or input dimension? – Blade May 22 '22 at 20:16