• Mindscape ๐Ÿ”ฅ
    • Playlist ๐ŸŽง
  • Algorithm

    • 1018๋ฒˆ: ์ฒด์ŠคํŒ ๋‹ค์‹œ ์น ํ•˜๊ธฐ
    • 1966๋ฒˆ: ํ”„๋ฆฐํ„ฐ ํ
    • Python ์‹œ๊ฐ„ ์ดˆ๊ณผ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ํŒ
    • C++ std::vector ์‚ฌ์šฉ๋ฒ• ์ •๋ฆฌ
    • Vim ์‚ฌ์šฉ ๋งค๋‰ด์–ผ
  • Ubuntu

    • ๋ฆฌ๋ˆ…์Šค ์šฐ๋ถ„ํˆฌ GRUB ํฐํŠธ ๋ณ€๊ฒฝ
    • ์šฐ๋ถ„ํˆฌ ์ด๋ฏธ์ง€ ๋น„๋””์˜ค ์ธ๋„ค์ผ(๋ฏธ๋ฆฌ๋ณด๊ธฐ) ์•ˆ ๋ณด์ž„ ๋ฌธ์ œ ํ•ด๊ฒฐ
    • Wine ํ™˜๊ฒฝ์—์„œ ์นด์นด์˜คํ†ก ์‹คํ–‰ ์‹œ explorer.exe ๋œจ์ง€ ์•Š๊ฒŒ ํ•˜๋Š” ๋ฒ•
    • ์šฐ๋ถ„ํˆฌ Wine ์นด์นด์˜คํ†ก ์‚ฌ์ง„ ์ด๋ฏธ์ง€ ์Šคํฌ๋ฆฐ์ƒท ๋ถ™์—ฌ๋„ฃ๊ธฐ
    • Wine ์นด์นด์˜คํ†ก ์ด๋ชจ์ง€ ๊นจ์ง ๋ฌธ์ œ ํ•ด๊ฒฐ
    • Ubuntu ์œˆ๋„์šฐ ์• ๋‹ˆ๋ฉ”์ด์…˜ ๋„๊ธฐ
  • Wellness

    • ์ฐจ์ „์žํ”ผ (Psyllium Husk)
    • ์—‘์ŠคํŠธ๋ผ ๋ฒ„์ง„ ์˜ฌ๋ฆฌ๋ธŒ์œ  (Extra Virgin Olive Oil)
    • ์ž๊ฐ€๋น„๊ฐ•์„ธ์ฒ™ (Nasal Irrigation)
    • QCY HT08 (MeloBuds Pro Plus)
    • ์ฝ˜์„œํƒ€ (Concerta)
    • ์ธ๋ฐ๋†€ (Inderal)
    • ์„คํŠธ๋ž„๋ฆฐ (Sertraline)
    • ๋ฉœ๋ผํ† ๋‹Œ (Melatonin)
    • ์น˜๊ฒฝ๋ถ€ ๋งˆ๋ชจ์ฆ
    • ๋ฐ”๋ฒจ ์Šค์ฟผํŠธ (Barbell Squat)
  • Humanities

    • Nordvik, Russia
    • North Sentinel Island
    • ๋กฑ๊ณ ๋กฑ๊ณ (Rongorongo)
    • ๋ฐ”๋กœํฌ ์Œ์•… (Baroque Music)
  • Design

    • ๊ตฌ๊ธ€์˜ ์•„์ด์ฝ˜ ๋Œ€๊ฐœํŽธ โ€” 6๋…„ ๋งŒ์˜ ์‹ค์ˆ˜ ์ธ์ •
    • ์ œ๋Ÿด๋“œ ์  ํƒ€ โ€” ๋Ÿญ์…”๋ฆฌ ์Šคํฌ์ธ  ์›Œ์น˜์˜ ์ฐฝ์‹œ์ž
    • ๋ฐ”์šฐํ•˜์šฐ์Šค โ€” ํ˜„๋Œ€ ๋””์ž์ธ์˜ ์›์ 
  • Brands

    • NOMOS Glashรผtte
    • Frรฉdรฉrique Constant
    • KZ (Knowledge Zenith)
    • ์—์ŠคํŠธ๋ผ (AESTURA)
    • JINHAO (้‡‘่ฑช)
    • Herman Miller
    • ๋ฐ์Šค์ปค (DESKER)
    • ๋ฌด์‹ ์‚ฌ ์Šคํƒ ๋‹ค๋“œ (Musinsa Standard)
  • Finance

    • ํ˜„๋Œ€์นด๋“œ ZERO โ€” Edition2 vs Edition3 ๋น„๊ต
    • ์‹ ํ•œ์นด๋“œ ์ฒ˜์Œ
    • S&P 500 ETF ํˆฌ์ž ๊ฐ€์ด๋“œ
    • ํŒŒํ‚นํ†ต์žฅ vs CMA ํ†ต์žฅ
    • ๋ฒ„ํฌ์…” ํ•ด์„œ์›จ์ด (Berkshire Hathaway)
    • ๋น„ํŠธ์ฝ”์ธ(Bitcoin)
  • Products

    • ์˜ค๋””์˜ค ์ธํ„ฐํŽ˜์ด์Šค (Audio Interface)
    • ์ฟ ๋ฃจํ† ๊ฐ€ (KURUTOGA)
    • CX31993 DAC ๋™๊ธ€
    • ํด๋ Œ์ง• ๋ฐ€ํฌ (Cleansing Milk)
    • ํ”ผ์ ฏ ํ† ์ด (Fidget Toy)
    • ThinkPad
  • Programming Languages

    • 8.0. Statement Level Control Structures
    • 8. Subprogram
    • 9. Implementing Subprogram
    • 10.1. Abstract Data Types and Encapsulation Constructs
    • 10.2. Support for Object Oriented Programming
    • 11. Concurrency
    • 12. FPL (1)
    • 13. FPL (2)
    • 14. Exception Handling and Event Handling
    • Final Exam

20. Neural Networks and Deep Learning (2)

์ž‘์„ฑ 2026. 6. 12.ยท์ˆ˜์ • 2026. 6. 12.

More Details on Feedforward Networks

Computation of Feedforward Networks (Review)

  • Network ๋‚ด์˜ ๊ฐ node๋Š” unit (๋˜๋Š” perceptron)์ด๋ผ ๋ถˆ๋ฆผ
  • Unit ๊ณ„์‚ฐ: (1) ์ด์ „ node๋“ค๋กœ๋ถ€ํ„ฐ์˜ ์ž…๋ ฅ์— ๋Œ€ํ•œ weighted sum ๊ณ„์‚ฐ, (2) output ์ƒ์„ฑ์„ ์œ„ํ•œ ๋น„์„ ํ˜• ํ•จ์ˆ˜ ์ ์šฉ
  • aja_jajโ€‹๋Š” unit jjj์˜ output, wijw_{ij}wijโ€‹๋Š” unit iii์—์„œ jjj๋กœ์˜ weight
  • ์ˆ˜์‹: aj=gj(โˆ‘iwijai)โ‰กgj(inj)a_j = g_j(\sum_i w_{ij} a_i) \equiv g_j(in_j)ajโ€‹=gjโ€‹(โˆ‘iโ€‹wijโ€‹aiโ€‹)โ‰กgjโ€‹(injโ€‹)
  • gjg_jgjโ€‹: unit jjj์™€ ์—ฐ๊ด€๋œ ๋น„์„ ํ˜• activation function
  • injin_jinjโ€‹: unit jjj๋กœ์˜ ์ž…๋ ฅ์— ๋Œ€ํ•œ weighted sum

Feedforward Networks with Another Notation

alt text

  • ์ž…๋ ฅ ๋ณ€์ˆ˜ x1,โ€ฆ,xDx_1, \dots, x_Dx1โ€‹,โ€ฆ,xDโ€‹์˜ MMM๊ฐœ linear combinations ๊ตฌ์„ฑ
  • aj=โˆ‘i=1Dwji(1)xi+wj0(1)a_j = \sum_{i=1}^D w_{ji}^{(1)} x_i + w_{j0}^{(1)}ajโ€‹=โˆ‘i=1Dโ€‹wji(1)โ€‹xiโ€‹+wj0(1)โ€‹, ์—ฌ๊ธฐ์„œ j=1,โ€ฆ,Mj = 1, \dots, Mj=1,โ€ฆ,M ์ด๊ณ  (1)์€ ์ฒซ ๋ฒˆ์งธ layer
  • wji(1)w_{ji}^{(1)}wji(1)โ€‹: weights, wj0(1)w_{j0}^{(1)}wj0(1)โ€‹: biases
  • aja_jajโ€‹: activations
  • ๊ฐ activation aja_jajโ€‹๋Š” ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ ๋น„์„ ํ˜• activation function h(โ‹…)h(\cdot)h(โ‹…)๋ฅผ ํ†ตํ•ด ๋ณ€ํ™˜
  • zj=h(aj)z_j = h(a_j)zjโ€‹=h(ajโ€‹), zjz_jzjโ€‹๋Š” hidden units
  • Hidden units zjz_jzjโ€‹๋Š” ๋‹ค์‹œ linear combination ๋˜์–ด output unit activations aka_kakโ€‹๋ฅผ ์ƒ์„ฑ
  • ak=โˆ‘j=1Mwkj(2)zj+wk0(2)a_k = \sum_{j=1}^M w_{kj}^{(2)} z_j + w_{k0}^{(2)}akโ€‹=โˆ‘j=1Mโ€‹wkj(2)โ€‹zjโ€‹+wk0(2)โ€‹, ์—ฌ๊ธฐ์„œ k=1,โ€ฆ,Kk = 1, \dots, Kk=1,โ€ฆ,K ์ด๊ณ  KKK๋Š” ์ด output ์ˆ˜
  • aka_kakโ€‹๋Š” ์ ์ ˆํ•œ activation function์„ ํ†ตํ•ด network output yky_kykโ€‹๊ฐ€ ๋จ
  • Standard regression: Identity (yk=aky_k = a_kykโ€‹=akโ€‹)
  • Multiple binary classification: Logistic sigmoid function (yk=ฯƒ(ak)y_k = \sigma(a_k)ykโ€‹=ฯƒ(akโ€‹))
  • Multiclass classification: Softmax activation function (yk=expโก(ak)/โˆ‘lexpโก(al)y_k = \exp(a_k) / \sum_l \exp(a_l)ykโ€‹=exp(akโ€‹)/โˆ‘lโ€‹exp(alโ€‹))
  • ์ „์ฒด network function (์˜ˆ: sigmoid output)

yk(x,ย w)=ฯƒ(โˆ‘j=1Mwkj(2)h(โˆ‘i=1Dwji(1)xi+wj0(1))+wk0(2))y_k(\mathbf{x},~ \mathbf{w}) = \sigma(\sum_{j=1}^M w_{kj}^{(2)} h(\sum_{i=1}^D w_{ji}^{(1)} x_i + w_{j0}^{(1)}) + w_{k0}^{(2)}) ykโ€‹(x,ย w)=ฯƒ(j=1โˆ‘Mโ€‹wkj(2)โ€‹h(i=1โˆ‘Dโ€‹wji(1)โ€‹xiโ€‹+wj0(1)โ€‹)+wk0(2)โ€‹)

  • w\mathbf{w}w: ๋ชจ๋“  weight์™€ bias parameter๋ฅผ ๊ทธ๋ฃนํ™”ํ•œ vector
  • ์ด ๊ณผ์ •์€ ์ •๋ณด์˜ forward propagation์œผ๋กœ ํ•ด์„ ๊ฐ€๋Šฅ
  • Neural network model: ์ž…๋ ฅ x={xi}\mathbf{x} = \{x_i\}x={xiโ€‹}์—์„œ ์ถœ๋ ฅ {yk}\{y_k\}{ykโ€‹}๋กœ์˜ ๋น„์„ ํ˜• ํ•จ์ˆ˜
  • Bias parameter๋Š” x0=1x_0 = 1x0โ€‹=1์ธ ์ถ”๊ฐ€ ์ž…๋ ฅ ๋ณ€์ˆ˜ x0x_0x0โ€‹๋ฅผ ์ •์˜ํ•˜์—ฌ weight parameter์— ํก์ˆ˜ ๊ฐ€๋Šฅ

aj=โˆ‘i=0Dwji(1)xia_j = \sum_{i=0}^D w_{ji}^{(1)} x_i ajโ€‹=i=0โˆ‘Dโ€‹wji(1)โ€‹xiโ€‹

  • ๋‹จ์ˆœํ™”๋œ ์ „์ฒด network function (2-layer bias ํฌํ•จ)

yk(x,ย w)=ฯƒ(โˆ‘j=0Mwkj(2)h(โˆ‘i=0Dwji(1)xi))y_k(\mathbf{x},~ \mathbf{w}) = \sigma(\sum_{j=0}^M w_{kj}^{(2)} h(\sum_{i=0}^D w_{ji}^{(1)} x_i)) ykโ€‹(x,ย w)=ฯƒ(j=0โˆ‘Mโ€‹wkj(2)โ€‹h(i=0โˆ‘Dโ€‹wji(1)โ€‹xiโ€‹))

  • Vector ๋ฐ matrices ํ™œ์šฉ ์ถ”๊ฐ€ ๋‹จ์ˆœํ™”

y(x,ย w)=ฯƒ(W(2)h(W(1)x))\mathbf{y}(\mathbf{x},~ \mathbf{w}) = \sigma(\mathbf{W}^{(2)} h(\mathbf{W}^{(1)} \mathbf{x})) y(x,ย w)=ฯƒ(W(2)h(W(1)x))

  • Feedforward nets๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ fully-connected

How to Count the Number of Layers in NNs

  • Neural network์˜ layer ์ˆ˜ ๊ณ„์‚ฐ ์šฉ์–ด์— ํ˜ผ๋ž€ ์กด์žฌ
  • 3-layer network: Unit์˜ layer ์ˆ˜๋ฅผ ์„ธ๋Š” ๋ฐฉ์‹ (input์„ unit์œผ๋กœ ์ทจ๊ธ‰)
  • Single-hidden-layer network: Hidden units์˜ layer ์ˆ˜๋ฅผ ์„ธ๋Š” ๋ฐฉ์‹
  • Two-layer network: Network ์†์„ฑ์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ adaptive weights์˜ layer ์ˆ˜๋ฅผ ์„ธ๋Š” ๋ฐฉ์‹

Training of Feedforward Nets

Different Loss (Error) Functions for Different Problems

  • Regression
    • Output activation function: Identity (yk=aky_k = a_kykโ€‹=akโ€‹)
    • Loss (error) function: Sum-of-squares (L2 or squared) error function

E(w)=12โˆ‘n=1Nโˆฅy(xn,w)โˆ’tnโˆฅ2E(\mathbf{w}) = \frac{1}{2} \sum_{n=1}^N \| \mathbf{y}(\mathbf{x}_n, \mathbf{w}) - \mathbf{t}_n \|^2 E(w)=21โ€‹n=1โˆ‘Nโ€‹โˆฅy(xnโ€‹,w)โˆ’tnโ€‹โˆฅ2

  • Binary classification
    • Single target variable ttt (t=1t=1t=1์€ C1C_1C1โ€‹, t=0t=0t=0์€ C2C_2C2โ€‹)
    • Output activation function: Sigmoid y=ฯƒ(a)y = \sigma(a)y=ฯƒ(a)
    • Loss (error) function: Negative log likelihood ๋˜๋Š” cross-entropy error function

E(w)=โˆ’โˆ‘n=1N{tnlogโกyn+(1โˆ’tn)logโก(1โˆ’yn)}E(\mathbf{w}) = - \sum_{n=1}^N \{ t_n \log y_n + (1 - t_n) \log(1 - y_n) \} E(w)=โˆ’n=1โˆ‘Nโ€‹{tnโ€‹logynโ€‹+(1โˆ’tnโ€‹)log(1โˆ’ynโ€‹)}

yn=y(xn,w)y_n = y(\mathbf{x}_n, \mathbf{w}) ynโ€‹=y(xnโ€‹,w)

  • Multiclass classification
    • ๊ฐ input์€ KKK๊ฐœ์˜ ์ƒํ˜ธ ๋ฐฐํƒ€์ ์ธ class ์ค‘ ํ•˜๋‚˜์— ํ• ๋‹น
    • Target tkโˆˆ{0,1}t_k \in \{0, 1\}tkโ€‹โˆˆ{0,1}๋Š” 1-of-K coding scheme ์‚ฌ์šฉ
    • Network output ํ•ด์„: yk(x,ย w)=P(tk=1โˆฃx)y_k(\mathbf{x},~ \mathbf{w}) = P(t_k = 1|\mathbf{x})ykโ€‹(x,ย w)=P(tkโ€‹=1โˆฃx)
    • Output activation function: Softmax yk=expโก(ak)/(โˆ‘lexpโก(al))y_k = \exp(a_k) / (\sum_l \exp(a_l))ykโ€‹=exp(akโ€‹)/(โˆ‘lโ€‹exp(alโ€‹))
    • Loss (error) function: Negative log likelihood ๋˜๋Š” cross-entropy error function

E(w)=โˆ’โˆ‘n=1Nโˆ‘k=1Ktnklogโกyk(xn,w)E(\mathbf{w}) = - \sum_{n=1}^N \sum_{k=1}^K t_{nk} \log y_k(\mathbf{x}_n, \mathbf{w}) E(w)=โˆ’n=1โˆ‘Nโ€‹k=1โˆ‘Kโ€‹tnkโ€‹logykโ€‹(xnโ€‹,w)

  • ์š”์•ฝ
    • ๋ฌธ์ œ ์œ ํ˜•์— ๋”ฐ๋ผ output unit activation function๊ณผ matching error function์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ์„ ํƒ ์กด์žฌ

Information Theory (Review)

  • Entropy
    • Random variable์˜ ๋ถˆํ™•์‹ค์„ฑ ์ฒ™๋„
    • H(X)=โˆ’ExโˆผP(X)[logโกP(x)]=ExโˆผP(X)[logโก1P(x)]H(X) = -\mathbb{E}_{x \sim P(X)} [\log P(x)] = \mathbb{E}_{x \sim P(X)} [\log \frac{1}{P(x)}]H(X)=โˆ’ExโˆผP(X)โ€‹[logP(x)]=ExโˆผP(X)โ€‹[logP(x)1โ€‹]
  • Kullback-Leibler (KL) Divergence
    • ๋™์ผํ•œ random variable XXX์— ๋Œ€ํ•œ ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ P(X)P(X)P(X)์™€ Q(X)Q(X)Q(X)๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ์ง€ ์ธก์ •
    • DKL(PโˆฅQ)=ExโˆผP(X)[logโกP(x)Q(x)]=ExโˆผP(X)[logโกP(x)โˆ’logโกQ(x)]D_{KL}(P \| Q) = \mathbb{E}_{x \sim P(X)}[\log \frac{P(x)}{Q(x)}] = \mathbb{E}_{x \sim P(X)}[\log P(x) - \log Q(x)]DKLโ€‹(PโˆฅQ)=ExโˆผP(X)โ€‹[logQ(x)P(x)โ€‹]=ExโˆผP(X)โ€‹[logP(x)โˆ’logQ(x)]
  • Cross-entropy
    • H(P,ย Q)=โˆ’ExโˆผP(X)[logโกQ(x)]=H(X)+DKL(PโˆฅQ)H(P,~ Q) = -\mathbb{E}_{x \sim P(X)}[\log Q(x)] = H(X) + D_{KL}(P \| Q)H(P,ย Q)=โˆ’ExโˆผP(X)โ€‹[logQ(x)]=H(X)+DKLโ€‹(PโˆฅQ)
    • QQQ์— ๋Œ€ํ•ด cross-entropy๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์€ KL divergence๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์ผ
์ตœ๊ทผ ์ˆ˜์ •: 26. 6. 12. ์˜คํ›„ 3:28
Contributors: kmbzn, Claude Sonnet 4.6

BUILT WITH

CloudflareNode.jsGitHubGitVue.jsJavaScriptVSCodenpm

All trademarks and logos are property of their respective owners.
ยฉ 2026 kmbzn ยท MIT License