Dropout
노트
- Moreover, the convergence properties of dropout can be understood in terms of stochastic gradient descent.[1]
- The weights of the network will be larger than normal because of dropout.[2]
- Therefore, before finalizing the network, the weights are first scaled by the chosen dropout rate.[2]
- This is sometimes called “inverse dropout” and does not require any modification of weights during training.[2]
- At test time, we scale down the output by the dropout rate.[2]
- Now that we know a little bit about dropout and the motivation, let’s go into some detail.[3]
- If you just wanted an overview of dropout in neural networks, the above two sections would be sufficient.[3]
- In dropout, we randomly shut down some fraction of a layer’s neurons at each training step by zeroing out the neuron values.[4]
- The fraction of neurons to be zeroed out is known as the dropout rate, .[4]
- The two images represent dropout applied to a layer of 6 units, shown at multiple training steps.[4]
- The dropout rate is 1/3, and the remaining 4 neurons at each training step have their value scaled by x1.5.[4]
- In this paper we conduct an empirical study to investigate the effect of dropout and batch normalization on training deep learning models.[5]
- Section 3 systematically describes the depth calculation model based on adaptive dropout proposed in this paper.[6]
- Finally, the value of the dropout rate for each layer needs to be in the interval (0, 1).[6]
소스
- ↑ The dropout learning algorithm
- ↑ 2.0 2.1 2.2 2.3 A Gentle Introduction to Dropout for Regularizing Deep Neural Networks
- ↑ 3.0 3.1 Dropout in (Deep) Machine learning
- ↑ 4.0 4.1 4.2 4.3 Dropout in Neural Networks
- ↑ Dropout vs. batch normalization: an empirical study of their impact to deep learning
- ↑ 6.0 6.1 Medical Image Segmentation Algorithm Based on Optimized Convolutional Neural Network-Adaptive Dropout Depth Calculation