Momentum Contrast for Unsupervised Visual Representation Learning.

Summary

This paper introduces a new contrastive learning approach – momentum contrast (MoCo). The key ideas are: 1) implement a queue as the dictionary to store a large number of keys; 2) update the key encoder using the momentum update rule. Compared with previous contrastive learning methods based on memory bank and end-to-end learning, MoCo not only supports a large negative sample size but also maintains a consistent key encoding. Through an ablation study, the authors verified the effectiveness of the proposed queue dictionary and momentum encoder update. When using a linear classifier on the extracted features from MoCo, the classification accuracy is competitive with SOTA methods. Moreover, MoCo does not require customized architectures for the pretext tasks, which makes it easier to transfer features for downstream tasks. The authors further tested the transferability of the learned feature from MoCo; experiment results show that MoCo outperforms the supervised learning approach on seven detection and segmentation tasks. The authors claim that MoCo has almost pushed the performance of unsupervised learning to the same level as supervised learning in terms of learning useful features.

Strengths

This paper reports a very encouraging progress on unsupervised learning, and it will likely be very influential. The key contributions of this paper enable contrastive learning to learn useful representations for many downstream visual tasks. Experiments are extensive and convincing. The authors’ claims are well supported by experiment results. The PyTorch-style pseudocode is my favorite part, which delivers the key idea of this paper in a precise and concise manner. Such contrastive learning methods could be applied to many other domains, such as NLP and RL. A recent ICML paper has shown that contrastively learned features could boost policy learning significantly [1].

Shortcomings

Overall, I feel this is a very high-quality paper, and I didn’t spot any significant weakness of this paper. However, one inherent issue of contrastive learning approaches is that they are very computation and memory intensive due to the requirement of a large number of negative pairs. It would be important to improve its memory and computation efficiency to allow a broader adoption for other applications.

Reference

Laskin, Michael, Aravind Srinivas, and Pieter Abbeel. “CURL: Contrastive Unsupervised Representations for Reinforcement Learning.” In the proceedings of the International Conference on Machine Learning (ICML). 2020.

Haozhu Wang

Momentum Contrast for Unsupervised Visual Representation Learning.

Summary

Strengths

Shortcomings

Reference

2020

PointRend: Image Segmentation as Rendering

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

Momentum Contrast for Unsupervised Visual Representation Learning.

Decoupled Weight Decay Regularization.

Desining Network Design Space.