• Mindscape ๐Ÿ”ฅ
    • Playlist ๐ŸŽง
  • Algorithm

    • 1018๋ฒˆ: ์ฒด์ŠคํŒ ๋‹ค์‹œ ์น ํ•˜๊ธฐ
    • 1966๋ฒˆ: ํ”„๋ฆฐํ„ฐ ํ
    • Python ์‹œ๊ฐ„ ์ดˆ๊ณผ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ํŒ
    • C++ std::vector ์‚ฌ์šฉ๋ฒ• ์ •๋ฆฌ
    • Vim ์‚ฌ์šฉ ๋งค๋‰ด์–ผ
  • Ubuntu

    • ๋ฆฌ๋ˆ…์Šค ์šฐ๋ถ„ํˆฌ GRUB ํฐํŠธ ๋ณ€๊ฒฝ
    • ์šฐ๋ถ„ํˆฌ ์ด๋ฏธ์ง€ ๋น„๋””์˜ค ์ธ๋„ค์ผ(๋ฏธ๋ฆฌ๋ณด๊ธฐ) ์•ˆ ๋ณด์ž„ ๋ฌธ์ œ ํ•ด๊ฒฐ
    • Wine ํ™˜๊ฒฝ์—์„œ ์นด์นด์˜คํ†ก ์‹คํ–‰ ์‹œ explorer.exe ๋œจ์ง€ ์•Š๊ฒŒ ํ•˜๋Š” ๋ฒ•
    • ์šฐ๋ถ„ํˆฌ Wine ์นด์นด์˜คํ†ก ์‚ฌ์ง„ ์ด๋ฏธ์ง€ ์Šคํฌ๋ฆฐ์ƒท ๋ถ™์—ฌ๋„ฃ๊ธฐ
    • Wine ์นด์นด์˜คํ†ก ์ด๋ชจ์ง€ ๊นจ์ง ๋ฌธ์ œ ํ•ด๊ฒฐ
    • Ubuntu ์œˆ๋„์šฐ ์• ๋‹ˆ๋ฉ”์ด์…˜ ๋„๊ธฐ
  • Wellness

    • ์ฐจ์ „์žํ”ผ (Psyllium Husk)
    • ์—‘์ŠคํŠธ๋ผ ๋ฒ„์ง„ ์˜ฌ๋ฆฌ๋ธŒ์œ  (Extra Virgin Olive Oil)
    • ์ž๊ฐ€๋น„๊ฐ•์„ธ์ฒ™ (Nasal Irrigation)
    • QCY HT08 (MeloBuds Pro Plus)
    • ์ฝ˜์„œํƒ€ (Concerta)
    • ์ธ๋ฐ๋†€ (Inderal)
    • ์„คํŠธ๋ž„๋ฆฐ (Sertraline)
    • ๋ฉœ๋ผํ† ๋‹Œ (Melatonin)
    • ์น˜๊ฒฝ๋ถ€ ๋งˆ๋ชจ์ฆ
    • ๋ฐ”๋ฒจ ์Šค์ฟผํŠธ (Barbell Squat)
  • Humanities

    • Nordvik, Russia
    • North Sentinel Island
    • ๋กฑ๊ณ ๋กฑ๊ณ (Rongorongo)
    • ๋ฐ”๋กœํฌ ์Œ์•… (Baroque Music)
  • Design

    • ๊ตฌ๊ธ€์˜ ์•„์ด์ฝ˜ ๋Œ€๊ฐœํŽธ โ€” 6๋…„ ๋งŒ์˜ ์‹ค์ˆ˜ ์ธ์ •
    • ์ œ๋Ÿด๋“œ ์  ํƒ€ โ€” ๋Ÿญ์…”๋ฆฌ ์Šคํฌ์ธ  ์›Œ์น˜์˜ ์ฐฝ์‹œ์ž
    • ๋ฐ”์šฐํ•˜์šฐ์Šค โ€” ํ˜„๋Œ€ ๋””์ž์ธ์˜ ์›์ 
  • Brands

    • NOMOS Glashรผtte
    • Frรฉdรฉrique Constant
    • KZ (Knowledge Zenith)
    • ์—์ŠคํŠธ๋ผ (AESTURA)
    • JINHAO (้‡‘่ฑช)
    • Herman Miller
    • ๋ฐ์Šค์ปค (DESKER)
    • ๋ฌด์‹ ์‚ฌ ์Šคํƒ ๋‹ค๋“œ (Musinsa Standard)
  • Finance

    • ํ˜„๋Œ€์นด๋“œ ZERO โ€” Edition2 vs Edition3 ๋น„๊ต
    • ์‹ ํ•œ์นด๋“œ ์ฒ˜์Œ
    • S&P 500 ETF ํˆฌ์ž ๊ฐ€์ด๋“œ
    • ํŒŒํ‚นํ†ต์žฅ vs CMA ํ†ต์žฅ
    • ๋ฒ„ํฌ์…” ํ•ด์„œ์›จ์ด (Berkshire Hathaway)
    • ๋น„ํŠธ์ฝ”์ธ(Bitcoin)
  • Products

    • ์˜ค๋””์˜ค ์ธํ„ฐํŽ˜์ด์Šค (Audio Interface)
    • ์ฟ ๋ฃจํ† ๊ฐ€ (KURUTOGA)
    • CX31993 DAC ๋™๊ธ€
    • ํด๋ Œ์ง• ๋ฐ€ํฌ (Cleansing Milk)
    • ํ”ผ์ ฏ ํ† ์ด (Fidget Toy)
    • ThinkPad
  • Programming Languages

    • 8.0. Statement Level Control Structures
    • 8. Subprogram
    • 9. Implementing Subprogram
    • 10.1. Abstract Data Types and Encapsulation Constructs
    • 10.2. Support for Object Oriented Programming
    • 11. Concurrency
    • 12. FPL (1)
    • 13. FPL (2)
    • 14. Exception Handling and Event Handling
    • Final Exam

21. Neural Networks and Deep Learning (3)

์ž‘์„ฑ 2026. 6. 12.ยท์ˆ˜์ • 2026. 6. 12.

Training of Feedforward Nets

A Graphical Illustration on Softmax

alt text

Cross-Entropy and Maximum Likelihood Estimation

  • Maximum Likelihood Estimation (MLE)
    • Examples {xn}\{\mathbf{x}_n\}{xnโ€‹}: Ptrue(x)P_{\text{true}}(\mathbf{x})Ptrueโ€‹(x) (unknown)์—์„œ independently drawn
    • Model: Pmodel(x;ย w)P_{\text{model}}(\mathbf{x};~\mathbf{w})Pmodelโ€‹(x;ย w) (parameters w\mathbf{w}w or ฮธ\thetaฮธ)
    • Goal: Pmodel(x;ย w)P_{\text{model}}(\mathbf{x};~\mathbf{w})Pmodelโ€‹(x;ย w)๊ฐ€ Ptrue(x)P_{\text{true}}(\mathbf{x})Ptrueโ€‹(x) (strictly, Pempirical(x)P_{\text{empirical}}(\mathbf{x})Pempiricalโ€‹(x))์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ w\mathbf{w}w ํƒ์ƒ‰
    • MLE principle

    wMLE=argโกmaxโกwPmodel(x;ย w)\mathbf{w}_{\text{MLE}} = \arg\max_{\mathbf{w}} P_{\text{model}}(\mathbf{x};~\mathbf{w}) wMLEโ€‹=argwmaxโ€‹Pmodelโ€‹(x;ย w)

    • Independence (Likelihood): wMLE=argโกmaxโกwโˆn=1NPmodel(xn;ย w)\mathbf{w}_{\text{MLE}} = \arg\max_{\mathbf{w}} \prod_{n=1}^{N} P_{\text{model}}(\mathbf{x}_n;~\mathbf{w})wMLEโ€‹=argmaxwโ€‹โˆn=1Nโ€‹Pmodelโ€‹(xnโ€‹;ย w)
    • Log likelihood (Computational stability): wMLE=argโกmaxโกwโˆ‘n=1NlogโกPmodel(xn;ย w)\mathbf{w}_{\text{MLE}} = \arg\max_{\mathbf{w}} \sum_{n=1}^{N} \log P_{\text{model}}(\mathbf{x}_n;~\mathbf{w})wMLEโ€‹=argmaxwโ€‹โˆ‘n=1Nโ€‹logPmodelโ€‹(xnโ€‹;ย w)
  • Cross-entropy์™€ MLE์˜ ๊ด€๊ณ„

wMLE=argโกmaxโกwโˆ‘n=1NlogโกPmodel(xn;ย w)\mathbf{w}_{\text{MLE}} = \arg\max_{\mathbf{w}} \sum_{n=1}^{N} \log P_{\text{model}}(\mathbf{x}_n;~\mathbf{w}) wMLEโ€‹=argwmaxโ€‹n=1โˆ‘Nโ€‹logPmodelโ€‹(xnโ€‹;ย w)

  • Scaling (Argmax ๋ถˆ๋ณ€):

wMLE=argโกmaxโกw1Nโˆ‘n=1NlogโกPmodel(xn;ย w)\mathbf{w}_{\text{MLE}} = \arg\max_{\mathbf{w}} \frac{1}{N} \sum_{n=1}^{N} \log P_{\text{model}}(\mathbf{x}_n;~\mathbf{w}) wMLEโ€‹=argwmaxโ€‹N1โ€‹n=1โˆ‘Nโ€‹logPmodelโ€‹(xnโ€‹;ย w)

=argโกmaxโกwExโˆผPempirical(x)[logโกPmodel(x;ย w)]= \arg\max_{\mathbf{w}} \mathbb{E}_{\mathbf{x} \sim P_{\text{empirical}}(\mathbf{x})}[\log P_{\text{model}}(\mathbf{x};~\mathbf{w})] =argwmaxโ€‹ExโˆผPempiricalโ€‹(x)โ€‹[logPmodelโ€‹(x;ย w)]

  • P(x)=Pempirical(x)P(\mathbf{x}) = P_{\text{empirical}}(\mathbf{x})P(x)=Pempiricalโ€‹(x), Q(x)=Pmodel(x;ย w)Q(\mathbf{x}) = P_{\text{model}}(\mathbf{x};~\mathbf{w})Q(x)=Pmodelโ€‹(x;ย w)๋กœ ์น˜ํ™˜

wMLE=argโกmaxโกwEP(x)[logโกQ(x)]\mathbf{w}_{\text{MLE}} = \arg\max_{\mathbf{w}} \mathbb{E}_{P(\mathbf{x})}[\log Q(\mathbf{x})] wMLEโ€‹=argwmaxโ€‹EP(x)โ€‹[logQ(x)]

=argโกminโกwโˆ’EP(x)[logโกQ(x)]=argโกminโกwH(P,Q)= \arg\min_{\mathbf{w}} - \mathbb{E}_{P(\mathbf{x})}[\log Q(\mathbf{x})] = \arg\min_{\mathbf{w}} H(P, Q) =argwminโ€‹โˆ’EP(x)โ€‹[logQ(x)]=argwminโ€‹H(P,Q)

  • ๊ฒฐ๋ก 
    • MLE๋ฅผ ์‚ฌ์šฉํ•œ wMLE\mathbf{w}_{\text{MLE}}wMLEโ€‹ ํš๋“ ๋ฌธ์ œ๋Š” Pempirical(x)P_{\text{empirical}}(\mathbf{x})Pempiricalโ€‹(x)์™€ Pmodel(x;ย w)P_{\text{model}}(\mathbf{x};~\mathbf{w})Pmodelโ€‹(x;ย w) ๊ฐ„์˜ cross-entropy ์ตœ์†Œํ™” ๋ฌธ์ œ์™€ ๋™์ผ

Parameter Optimization

  • Task: ์„ ํƒ๋œ E(w)E(\mathbf{w})E(w)๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” weight vector w\mathbf{w}w ํƒ์ƒ‰
  • Global minimum: ๋ชจ๋“  weight vector์— ๋Œ€ํ•ด error function์˜ ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’์— ํ•ด๋‹นํ•˜๋Š” minimum
  • Local minima: Error function์˜ ๋” ๋†’์€ ๊ฐ’์— ํ•ด๋‹นํ•˜๋Š” ๋‹ค๋ฅธ ๋ชจ๋“  minima
  • Note: Neural network์˜ ์„ฑ๊ณต์  ์ ์šฉ์„ ์œ„ํ•ด global minimum ํƒ์ƒ‰์ด ํ•„์ˆ˜์ ์ด ์•„๋‹ ์ˆ˜ ์žˆ์Œ, ์ถฉ๋ถ„ํžˆ ์ข‹์€ solution์„ ์œ„ํ•ด ์—ฌ๋Ÿฌ local minima ๋น„๊ต ํ•„์š”

alt text

Forward Propagation & Backpropagation

  • Error backpropagation (or simply backprop)
    • Feed-forward neural network์˜ error function E(w)E(\mathbf{w})E(w)์˜ gradient โˆ‡E(w)\nabla E(\mathbf{w})โˆ‡E(w)๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ํšจ์œจ์ ์ธ ๊ธฐ์ˆ 
    • Chain rule์„ ๋งค์šฐ ํšจ์œจ์ ์ธ ํŠน์ • ์—ฐ์‚ฐ ์ˆœ์„œ๋กœ ๊ณ„์‚ฐํ•˜๋Š” algorithm
  • Our example
    • A simple layered network (sigmoidal hidden units, sum-of-squares error)
  • In a general feed-forward network,
    • ๊ฐ unit์€ weighted sum ๊ณ„์‚ฐ

    aj=โˆ‘iwjizia_j = \sum_i w_{ji} z_i ajโ€‹=iโˆ‘โ€‹wjiโ€‹ziโ€‹

    • ziz_iziโ€‹: Unit jjj๋กœ connection์„ ๋ณด๋‚ด๋Š” unit/input์˜ activation
    • wjiw_{ji}wjiโ€‹: Connection๊ณผ ์—ฐ๊ด€๋œ weight
    • aja_jajโ€‹๋Š” nonlinear activation function h(โ‹…)h(\cdot)h(โ‹…)์— ์˜ํ•ด ๋ณ€ํ™˜๋˜์–ด unit jjj์˜ activation zjz_jzjโ€‹ ์ƒ์„ฑ

    zj=h(aj)z_j = h(a_j) zjโ€‹=h(ajโ€‹)

Forward Propagation

  • For each pattern in the training set
    • Input vector๋ฅผ network์— ์ œ๊ณต
    • aja_jajโ€‹์™€ zjz_jzjโ€‹์˜ ์—ฐ์† ์ ์šฉ์œผ๋กœ ๋ชจ๋“  hidden unit ๋ฐ output unit์˜ activation ๊ณ„์‚ฐ
    • ์ด process๋Š” forward propagation์ด๋ผ ๋ถˆ๋ฆผ (์ •๋ณด์˜ forward flow)

alt text

Chain Rule of Calculus

  • Chain rule for scalars
    • y=g(x)y = g(x)y=g(x) ์ด๊ณ  z=f(g(x))=f(y)z = f(g(x)) = f(y)z=f(g(x))=f(y) ์ผ ๋•Œ

      dzdx=dzdyโ‹…dydx\frac{dz}{dx} = \frac{dz}{dy} \cdot \frac{dy}{dx} dxdzโ€‹=dydzโ€‹โ‹…dxdyโ€‹

  • Chain rule for vectors
    • xโˆˆRm,ย yโˆˆRn\mathbf{x} \in \mathbb{R}^m,~\mathbf{y} \in \mathbb{R}^nxโˆˆRm,ย yโˆˆRn, y=g(x)\mathbf{y} = g(\mathbf{x})y=g(x) ์ด๊ณ  z=f(y)z = f(\mathbf{y})z=f(y) ์ผ ๋•Œ

      โˆ‚zโˆ‚xi=โˆ‘jโˆ‚zโˆ‚yjโ‹…โˆ‚yjโˆ‚xi\frac{\partial z}{\partial x_i} = \sum_j \frac{\partial z}{\partial y_j} \cdot \frac{\partial y_j}{\partial x_i} โˆ‚xiโ€‹โˆ‚zโ€‹=jโˆ‘โ€‹โˆ‚yjโ€‹โˆ‚zโ€‹โ‹…โˆ‚xiโ€‹โˆ‚yjโ€‹โ€‹

    • Vector notation

      โˆ‡xz=(โˆ‚yโˆ‚x)Tโˆ‡yz\nabla_{\mathbf{x}} z = (\frac{\partial \mathbf{y}}{\partial \mathbf{x}})^T \nabla_{\mathbf{y}} z โˆ‡xโ€‹z=(โˆ‚xโˆ‚yโ€‹)Tโˆ‡yโ€‹z

    • โˆ‚yโˆ‚x\frac{\partial \mathbf{y}}{\partial \mathbf{x}}โˆ‚xโˆ‚yโ€‹๋Š” ggg์˜ nร—mn \times mnร—m Jacobian matrix

Backpropagation

  • EnE_nEnโ€‹์˜ wjiw_{ji}wjiโ€‹์— ๋Œ€ํ•œ derivative ํ‰๊ฐ€
    • EnE_nEnโ€‹์€ wjiw_{ji}wjiโ€‹์— ๋Œ€ํ•ด ์˜ค์ง summed input aja_jajโ€‹๋ฅผ ํ†ตํ•ด์„œ๋งŒ Dependant
    • Chain rule: โˆ‚Enโˆ‚wji=โˆ‚Enโˆ‚ajโ‹…โˆ‚ajโˆ‚wji\frac{\partial E_n}{\partial w_{ji}} = \frac{\partial E_n}{\partial a_j} \cdot \frac{\partial a_j}{\partial w_{ji}}โˆ‚wjiโ€‹โˆ‚Enโ€‹โ€‹=โˆ‚ajโ€‹โˆ‚Enโ€‹โ€‹โ‹…โˆ‚wjiโ€‹โˆ‚ajโ€‹โ€‹
    • Notation (errors ฮด\deltaฮด): ฮดjโ‰กโˆ‚Enโˆ‚aj\delta_j \equiv \frac{\partial E_n}{\partial a_j}ฮดjโ€‹โ‰กโˆ‚ajโ€‹โˆ‚Enโ€‹โ€‹
    • โˆ‚ajโˆ‚wji=zi\frac{\partial a_j}{\partial w_{ji}} = z_iโˆ‚wjiโ€‹โˆ‚ajโ€‹โ€‹=ziโ€‹
    • Result: โˆ‚Enโˆ‚wji=ฮดjโ‹…zi\frac{\partial E_n}{\partial w_{ji}} = \delta_j \cdot z_iโˆ‚wjiโ€‹โˆ‚Enโ€‹โ€‹=ฮดjโ€‹โ‹…ziโ€‹
  • โˆ‚Enโˆ‚wji=ฮดjโ‹…zi\frac{\partial E_n}{\partial w_{ji}} = \delta_j \cdot z_iโˆ‚wjiโ€‹โˆ‚Enโ€‹โ€‹=ฮดjโ€‹โ‹…ziโ€‹
    • Required derivative: Weight์˜ output end unit(jjj)์˜ ฮด\deltaฮด ๊ฐ’๊ณผ input end unit(iii)์˜ ziz_iziโ€‹ ๊ฐ’์„ ๊ณฑํ•˜์—ฌ ํš๋“
  • Derivative ํ‰๊ฐ€: Network์˜ ๊ฐ hidden/output unit์— ๋Œ€ํ•œ ฮดj\delta_jฮดjโ€‹ ๊ฐ’ ๊ณ„์‚ฐ ํ•„์š”
  • For the output units (with L2 loss, assuming identity activation yj=ajy_j = a_jyjโ€‹=ajโ€‹)
    • En(w)=12โˆ‘k(ykโˆ’tk)2E_n(\mathbf{w}) = \frac{1}{2} \sum_k (y_k - t_k)^2Enโ€‹(w)=21โ€‹โˆ‘kโ€‹(ykโ€‹โˆ’tkโ€‹)2
    • ฮดj=yjโˆ’tj\delta_j = y_j - t_jฮดjโ€‹=yjโ€‹โˆ’tjโ€‹
  • To evaluate the ฮด\deltaฮด's for hidden units,
    • Chain rule: ฮดjโ‰กโˆ‚Enโˆ‚aj=โˆ‘kโˆ‚Enโˆ‚akโ‹…โˆ‚akโˆ‚aj\delta_j \equiv \frac{\partial E_n}{\partial a_j} = \sum_k \frac{\partial E_n}{\partial a_k} \cdot \frac{\partial a_k}{\partial a_j}ฮดjโ€‹โ‰กโˆ‚ajโ€‹โˆ‚Enโ€‹โ€‹=โˆ‘kโ€‹โˆ‚akโ€‹โˆ‚Enโ€‹โ€‹โ‹…โˆ‚ajโ€‹โˆ‚akโ€‹โ€‹ (sum over units kkk to which unit jjj sends connections)
    • โˆ‚akโˆ‚aj=โˆ‚akโˆ‚zjโ‹…โˆ‚zjโˆ‚aj=wkjโ‹…hโ€ฒ(aj)\frac{\partial a_k}{\partial a_j} = \frac{\partial a_k}{\partial z_j} \cdot \frac{\partial z_j}{\partial a_j} = w_{kj} \cdot h'(a_j)โˆ‚ajโ€‹โˆ‚akโ€‹โ€‹=โˆ‚zjโ€‹โˆ‚akโ€‹โ€‹โ‹…โˆ‚ajโ€‹โˆ‚zjโ€‹โ€‹=wkjโ€‹โ‹…hโ€ฒ(ajโ€‹)
  • Backpropagation formula
    • ฮดj=hโ€ฒ(aj)โˆ‘kwkjฮดk\delta_j = h'(a_j) \sum_k w_{kj} \delta_kฮดjโ€‹=hโ€ฒ(ajโ€‹)โˆ‘kโ€‹wkjโ€‹ฮดkโ€‹
    • Hidden unit์˜ ฮด\deltaฮด: Network ์ƒ์œ„ unit์œผ๋กœ๋ถ€ํ„ฐ ฮด\deltaฮด๋ฅผ backward๋กœ propagateํ•˜์—ฌ ํš๋“

Automatic Differentiation

  • Gradient computation: Forward propagation์˜ symbolic expression์—์„œ ์ž๋™ ์ถ”๋ก  ๊ฐ€๋Šฅ
  • Modern DL frameworks (Tensorflow, PyTorch, etc.): Backpropagation ์ž๋™ ์ˆ˜ํ–‰

A Simple Example of Backprop

  • Function: f(x,ย y,ย z)=(x+y)zf(x,~y,~z) = (x+y)zf(x,ย y,ย z)=(x+y)z
  • Intermediate variable: q=x+yq = x+yq=x+y, f=qzf = qzf=qz
  • Example values: x=โˆ’2,ย y=5,ย z=โˆ’4x = -2,~y = 5,~z = -4x=โˆ’2,ย y=5,ย z=โˆ’4
  • Forward propagation
    • q=x+yโ†’q=โˆ’2+5=3q = x + y \quad \rightarrow \quad q = -2 + 5 = 3q=x+yโ†’q=โˆ’2+5=3
    • f=qzโ†’f=3ร—(โˆ’4)=โˆ’12f = qz \quad \rightarrow \quad f = 3 \times (-4) = -12f=qzโ†’f=3ร—(โˆ’4)=โˆ’12
  • Backpropagation (computing gradients)
    • โˆ‚fโˆ‚f=1\frac{\partial f}{\partial f} = 1โˆ‚fโˆ‚fโ€‹=1

    • โˆ‚fโˆ‚z=q=3\frac{\partial f}{\partial z} = q = 3โˆ‚zโˆ‚fโ€‹=q=3 (Local gradient โˆ‚fโˆ‚z=q\frac{\partial f}{\partial z} = qโˆ‚zโˆ‚fโ€‹=q)

    • โˆ‚fโˆ‚q=z=โˆ’4\frac{\partial f}{\partial q} = z = -4โˆ‚qโˆ‚fโ€‹=z=โˆ’4 (Local gradient โˆ‚fโˆ‚q=z\frac{\partial f}{\partial q} = zโˆ‚qโˆ‚fโ€‹=z)

    • โˆ‚fโˆ‚y=โˆ‚fโˆ‚qโ‹…โˆ‚qโˆ‚y=(โˆ’4)โ‹…(1)=โˆ’4\frac{\partial f}{\partial y} = \frac{\partial f}{\partial q} \cdot \frac{\partial q}{\partial y} = (-4) \cdot (1) = -4โˆ‚yโˆ‚fโ€‹=โˆ‚qโˆ‚fโ€‹โ‹…โˆ‚yโˆ‚qโ€‹=(โˆ’4)โ‹…(1)=โˆ’4 (Upstream โˆ‚fโˆ‚q\frac{\partial f}{\partial q}โˆ‚qโˆ‚fโ€‹, Local โˆ‚qโˆ‚y=1\frac{\partial q}{\partial y} = 1โˆ‚yโˆ‚qโ€‹=1)

    • โˆ‚fโˆ‚x=โˆ‚fโˆ‚qโ‹…โˆ‚qโˆ‚x=(โˆ’4)โ‹…(1)=โˆ’4\frac{\partial f}{\partial x} = \frac{\partial f}{\partial q} \cdot \frac{\partial q}{\partial x} = (-4) \cdot (1) = -4โˆ‚xโˆ‚fโ€‹=โˆ‚qโˆ‚fโ€‹โ‹…โˆ‚xโˆ‚qโ€‹=(โˆ’4)โ‹…(1)=โˆ’4 (Upstream โˆ‚fโˆ‚q\frac{\partial f}{\partial q}โˆ‚qโˆ‚fโ€‹, Local โˆ‚qโˆ‚x=1\frac{\partial q}{\partial x} = 1โˆ‚xโˆ‚qโ€‹=1)

  • Pattern: Gate (์—ฐ์‚ฐ)๋Š” local gradient๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ์ด๋ฅผ upstream gradient์™€ ๊ณฑํ•˜์—ฌ downstream gradient ๊ณ„์‚ฐ
์ตœ๊ทผ ์ˆ˜์ •: 26. 6. 12. ์˜คํ›„ 3:28
Contributors: kmbzn, Claude Sonnet 4.6

BUILT WITH

CloudflareNode.jsGitHubGitVue.jsJavaScriptVSCodenpm

All trademarks and logos are property of their respective owners.
ยฉ 2026 kmbzn ยท MIT License