Barlow Twins: a new super-simple self-supervised method to train joint-embedding architectures (aka Siamese nets) non
Basic idea: maximize the normalized correlation between a variable in the left branch and the same var in the right branch, while making the normalized cross-correlation between one var in the left branch and all other vars in the right branch as close to zero as possible.
2/N
In short: the loss tries to make the normalized cross-correlation between the embedding vectors coming out of the left branch and the right branch as close to the identity matrix as possible.
3/N
The 2 branches are always fed with differently-distorted version of the same image, and there is no need for dissimilar training pairs.
The objective makes the embedding vectors of the two branches as similar as possible, while maximizing their information content.
4/N
No contrastive samples, no huge batch size (optimal is 128), nor predictor, no moving-average weights, no vector quantization, nor cut gradients in one of the branches.
5/N