Desining Network Design Space.

Summary

This paper introduces a method for designing the design space of deep neural networks. The core idea is to sample network designs from a design space and analyze the quality of the design space using error empirical density function. A good design space would lead to designs with a high concentration of good models. The authors start with a simple, yet generic architecture template based on the standard residual bottleneck block as the unconstrained design space. Through a set of extensive experiments, the authors successfully discovered four ways to constrain the design space to a high-quality region. Specifically, the authors set all residual blocks to share the same bottleneck ratio and group width. In addition, the authors enforce the stage depths to be non-decreasing and the stage widths to follow a quantized linear relationship with respect to the block index. Overall, the authors managed to reduce the design space size from $O(10^{18})$ to $O(10^8)$. Through experiments, the authors found the design space leads to a good performance across a range of flop regimes. With the refined search space, the authors managed to discover a network architecture with similar performance as EfficientNet trained with the same procedure but is up to 5 times faster on GPUs.

Strengths

Overall this paper proposes a simple and effective method that can construct a high-quality design space for searching neural network architectures. Compared to neural architecture search (NAS), which can only discover a single network instance for a specific task, the design space design method generalizes to different training regimes and can provide us more insights about the general design principles for neural networks. The findings from the discovered design space may help us better understand neural network architectures. In terms of writing, this paper has a really nice organization, and the authors deliver the key messages precisely. The extensive experiment performed by the authors is also very impressive. I think the idea of designing network design space can be extended to many other application domains, such as NLP and RL.

Shortcomings

From Table 4, we observe that the top-1 error of RegNetY is higher than EfficientNet trained with enhanced training schedules. It’s not clear how would the RegNet design space choices affect the results of training-time enhancements. If the RegNet design space lead to a limited effect of training-time enhancement, the proposed method might be less practical unless we can come up with training-time enhancement methods effective for the RegNet model family.

Haozhu Wang

Desining Network Design Space.

Summary

Strengths

Shortcomings

2020

PointRend: Image Segmentation as Rendering

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

Momentum Contrast for Unsupervised Visual Representation Learning.

Decoupled Weight Decay Regularization.

Desining Network Design Space.