I’m excited to share our new paper on HyperTransformers, a novel architecture for few-shot learning able to generate the weights of a CNN directly from a given support set. 🧵👇

📜: https://t.co/vcm67G6P6t with Andrey Zhmoginov and Mark Sandler.

2) We train a transformer model to `convert` a few-shot task description into a small CNN network specialized in solving it on new images.
3) This effectively decouples a high-capacity transformer generator from a much smaller inference model. It is different from most of the existing methods, e.g. MAML where the generator and the executing model share the same architecture.
4) CNN weights are generated layer-by-layer from a combination of layer embedding (features from the last generated layer), and image w/ class embeddings (features directly from the data). The final weights are extracted from output of self-attention (similar to [CLS] tokens).
5) What is cool is that we can also add unlabeled samples from the support set into the mix, effectively allowing for semi-supervised few-shot learning!
6) HyperTransformers are comparable in performance to many competing methods on miniIImageNet and tieredImageNet datasets.
7) But our method especially shines for the case of small target CNN architectures, where the large capacity of the transformer model is the most useful and noticeable. For the 8-channels model we are seeing 5-10% improvement over MAML++!
8) As it turns out, for small target models, where every neuron matters, it is important to generate the whole network from a given support set. For larger target models even generating only the last logits layers appears to be sufficient.
9) We are really excited about the direction of using Transformers to guide the construction and performance of smaller specialized models e.g. in low-power settings. This has a lot of applications in the areas where high-performance compact personalized networks are being used.

More from All

You May Also Like