Dropout

수학노트
Pythagoras0 (토론 | 기여)님의 2020년 12월 15일 (화) 23:49 판 (→‎노트: 새 문단)
(차이) ← 이전 판 | 최신판 (차이) | 다음 판 → (차이)
둘러보기로 가기 검색하러 가기

노트

  • Moreover, the convergence properties of dropout can be understood in terms of stochastic gradient descent.[1]
  • The weights of the network will be larger than normal because of dropout.[2]
  • Therefore, before finalizing the network, the weights are first scaled by the chosen dropout rate.[2]
  • This is sometimes called “inverse dropout” and does not require any modification of weights during training.[2]
  • At test time, we scale down the output by the dropout rate.[2]
  • Now that we know a little bit about dropout and the motivation, let’s go into some detail.[3]
  • If you just wanted an overview of dropout in neural networks, the above two sections would be sufficient.[3]
  • In dropout, we randomly shut down some fraction of a layer’s neurons at each training step by zeroing out the neuron values.[4]
  • The fraction of neurons to be zeroed out is known as the dropout rate, .[4]
  • The two images represent dropout applied to a layer of 6 units, shown at multiple training steps.[4]
  • The dropout rate is 1/3, and the remaining 4 neurons at each training step have their value scaled by x1.5.[4]
  • In this paper we conduct an empirical study to investigate the effect of dropout and batch normalization on training deep learning models.[5]
  • Section 3 systematically describes the depth calculation model based on adaptive dropout proposed in this paper.[6]
  • Finally, the value of the dropout rate for each layer needs to be in the interval (0, 1).[6]

소스