• Mindscape ๐Ÿ”ฅ
    • Playlist ๐ŸŽง
  • ๐Ÿค– Artifical Intelligence

    • 1. Basics; Linear Algebra
    • 2. Basics; Linear Algebra (2), Search (1)
    • 3. Search (2)
    • 4. Knowledge and Logic (1)
    • 5. Knowledge and Logic (2)
    • 6. Probability
    • 7. Information Theory
    • 8. Probabilitc Reasoning (2)
    • 9. Probabilitc Reasoning (3)
    • 10. Machine Learning (1)
    • 11. Machine Learning (2)
    • 12. Machine Learning (3)
    • 13. Linear Models
    • 14. Other Classic ML Models (1)
    • 15. Other Classic ML Models (2)
  • ๐Ÿ”’ Computer Security

    • 01. Overview
    • 02. ์ •๋ณด๋ณด์•ˆ์ •์ฑ… ๋ฐ ๋ฒ•๊ทœ
    • 03. Cryptographic Tools
    • 04. User Authentication
    • 05. Access Control
    • 06. Database Security
    • 07. Malicious Software
    • 08. Firmware Analysis
  • ๐Ÿ—„๏ธ Database System

    • 1. Introduction
    • 2. Relational Model
    • 3. SQL
    • 6. E-R Model
    • 7. Relational Database Design (1)
    • 7. Relational Database Design (2)
    • 13. Data Storage Structures
    • 14. Indexing
    • 15. Query Processing
  • ๐Ÿ“ Software Engineering

    • 2. Introduction to Software Engineering
    • 3. Process
    • 4. Process Models
    • 5. Agile
    • 6. Requirements
    • 7. Requirements Elicitation and Documentation
    • 8. Architecture
    • 9. Unified Modelling Language
    • 10. Object-Oriented Analysis
    • Object-Oriented Design
  • ๐Ÿง  Algorithm

    • Python ์‹œ๊ฐ„ ์ดˆ๊ณผ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ํŒ
    • C++ std::vector ์‚ฌ์šฉ๋ฒ• ์ •๋ฆฌ
    • Vim ์‚ฌ์šฉ ๋งค๋‰ด์–ผ
    • 1018๋ฒˆ: ์ฒด์ŠคํŒ ๋‹ค์‹œ ์น ํ•˜๊ธฐ
    • 1966๋ฒˆ: ํ”„๋ฆฐํ„ฐ ํ

7. Information Theory

  • Information theory๋Š” (Branch) of applied mathematics (์‘์šฉ ์ˆ˜ํ•™์˜ ํ•œ ๋ถ„์•ผ)๋กœ, ์‹ ํ˜ธ์— ์กด์žฌํ•˜๋Š” ์ •๋ณด๋Ÿ‰์„ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ๊ฒƒ์„ ์ค‘์‹ฌ์œผ๋กœ ํ•จ
  • ์›๋ž˜๋Š” (Discrete alphabets, ์ด์‚ฐ ์•ŒํŒŒ๋ฒณ)์—์„œ (Noisy channel, ์žก์Œ ์ฑ„๋„)์„ ํ†ตํ•ด ๋ฉ”์‹œ์ง€๋ฅผ ๋ณด๋‚ด๋Š” ๊ฒƒ์„ ์—ฐ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ๋ฐœ๋ช…๋จ (์˜ˆ: ๋ผ๋””์˜ค ์†ก์‹ ์„ ํ†ตํ•œ ํ†ต์‹ )
  • (Optimal codes, ์ตœ์  ์ฝ”๋“œ)๋ฅผ ์„ค๊ณ„ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๋ฉ”์‹œ์ง€์˜ ์˜ˆ์ƒ ๊ธธ์ด๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•จ
  • ๊ธฐ๋ณธ (Intuition, ์ง๊ด€)์€ ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋‚ฎ์€ (Event, ์‚ฌ๊ฑด)์ด ๋ฐœ์ƒํ–ˆ์Œ์„ ์•Œ๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด ๋ฐœ์ƒํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์‚ฌ๊ฑด์ด ๋ฐœ์ƒํ–ˆ์Œ์„ ์•Œ๊ฒŒ ๋˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋” ๋งŽ์€ (Informative, ์ •๋ณด)๋ฅผ ์ œ๊ณตํ•œ๋‹ค๋Š” ๊ฒƒ
  • ์˜ˆ: โ€œ์˜ค๋Š˜ ์•„์นจ ํ•ด๊ฐ€ ๋–ด๋‹คโ€๋Š” ๋ฉ”์‹œ์ง€๋Š” ๋ถˆํ•„์š”ํ•  ์ •๋„๋กœ ์ •๋ณด๊ฐ€ ์ ์ง€๋งŒ, โ€œ์˜ค๋Š˜ ์•„์นจ ์ผ์‹์ด ์žˆ์—ˆ๋‹คโ€๋Š” ๋ฉ”์‹œ์ง€๋Š” ๋งค์šฐ ์ •๋ณด๊ฐ€ ๋งŽ์Œ
  • ์ด ์ง๊ด€์„ ๊ณต์‹ํ™”ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ •๋ณด๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜๊ณ  ์‹ถ์Œ
  • Likely events๋Š” ๋‚ฎ์€ (Information content, ์ •๋ณด ๋‚ด์šฉ)์„ ๊ฐ€์ ธ์•ผ ํ•˜๋ฉฐ, ๊ทน๋‹จ์ ์ธ ๊ฒฝ์šฐ ๋ฐ˜๋“œ์‹œ ์ผ์–ด๋‚  ์‚ฌ๊ฑด์€ ์ •๋ณด ๋‚ด์šฉ์ด ์ „ํ˜€ ์—†์–ด์•ผ ํ•จ
  • Less likely events๋Š” ๋” ๋†’์€ ์ •๋ณด ๋‚ด์šฉ์„ ๊ฐ€์ ธ์•ผ ํ•จ
  • (Independent events, ๋…๋ฆฝ ์‚ฌ๊ฑด)์€ (Additive information, ๊ฐ€์‚ฐ ์ •๋ณด)๋ฅผ ๊ฐ€์ ธ์•ผ ํ•จ
    • ์˜ˆ๋ฅผ ๋“ค์–ด, ๋˜์ง„ ์ฝ”์ธ์ด ๋‘ ๋ฒˆ ์•ž๋ฉด์ด ๋‚˜์™”๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋˜๋Š” ๊ฒƒ์€ ํ•œ ๋ฒˆ ์•ž๋ฉด์ด ๋‚˜์™”๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋‘ ๋ฐฐ์˜ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•ด์•ผ ํ•จ
  • ์ด ์„ธ ๊ฐ€์ง€ ์†์„ฑ์„ ๋ชจ๋‘ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•ด, (Event) X=xX=xX=x์˜ (Self-information) I(x)I(x)I(x)๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•จ

I(x)=โˆ’logโกP(x)I(x) = -\log P(x) I(x)=โˆ’logP(x)

  • (Natural logarithm, ์ž์—ฐ๋กœ๊ทธ) (๋ฐ‘ eee)๋ฅผ ์ฑ„ํƒํ•  ๋•Œ, 1e\frac{1}{e}e1โ€‹์˜ ํ™•๋ฅ ์„ ๊ฐ–๋Š” ์‚ฌ๊ฑด์„ ๊ด€์ฐฐํ•˜์—ฌ ์–ป๋Š” ์ •๋ณด๋Ÿ‰์„ one nat\text{nat}nat์ด๋ผ ํ•จ
  • ๋ฐ‘ 222๋ฅผ ์ด์šฉํ•  ๋•Œ, bits\text{bits}bits ๋˜๋Š” shannons\text{shannons}shannons๋ผ๊ณ  ํ•จ

Entropy

  • (Self-information)์€ ๋‹จ ํ•˜๋‚˜์˜ (Outcome, ๊ฒฐ๊ณผ)๋งŒ์„ ๋‹ค๋ฃธ
  • (Shannon) Entropy๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒด (Probability distribution, ํ™•๋ฅ  ๋ถ„ํฌ)์˜ (Uncertainty, ๋ถˆํ™•์‹ค์„ฑ) ์–‘์„ ์ •๋Ÿ‰ํ™”ํ•  ์ˆ˜ ์žˆ์Œ

H(X)=ExโˆผP(X)[I(x)]=โˆ’ExโˆผP(X)[logโกP(x)]H(X) = E_{x \sim P(X)} [I(x)] = -E_{x \sim P(X)}[\log P(x)] H(X)=ExโˆผP(X)โ€‹[I(x)]=โˆ’ExโˆผP(X)โ€‹[logP(x)]

  • ๋ถ„ํฌ์˜ Shannon Entropy๋Š” ๊ทธ ๋ถ„ํฌ์—์„œ ์ถ”์ถœ๋œ ์‚ฌ๊ฑด์˜ (Expected amount of information, ์˜ˆ์ƒ ์ •๋ณด๋Ÿ‰)์ž„
  • ๋ถ„ํฌ PPP์—์„œ ์ถ”์ถœ๋œ Symbol์„ ์ธ์ฝ”๋”ฉํ•˜๋Š” ๋ฐ ํ‰๊ท ์ ์œผ๋กœ ํ•„์š”ํ•œ ๋น„ํŠธ ์ˆ˜์˜ (Lower bound, ํ•˜ํ•œ)์„ ์ œ๊ณตํ•จ
  • (Nearly deterministic, ๊ฑฐ์˜ ๊ฒฐ์ •๋ก ์ )์ธ ๋ถ„ํฌ (๊ฒฐ๊ณผ๊ฐ€ ๊ฑฐ์˜ ํ™•์‹คํ•œ ๊ฒฝ์šฐ)๋Š” ๋‚ฎ์€ Entropy๋ฅผ ๊ฐ€์ง (Uniform, ๊ท ์ผ)์— ๊ฐ€๊นŒ์šด ๋ถ„ํฌ๋Š” ๋†’์€ Entropy๋ฅผ ๊ฐ€์ง

Kullback-Leibler (KL) Divergence

  • ๋™์ผํ•œ (Random variable, ํ™•๋ฅ  ๋ณ€์ˆ˜) XXX์— ๋Œ€ํ•œ ๋‘ ๊ฐœ์˜ ๊ฐœ๋ณ„ (Probability distributions) P(X)P(X)P(X)์™€ Q(X)Q(X)Q(X)๊ฐ€ ์žˆ์„ ๋•Œ, (Kullback-Leibler (KL) divergence)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด ๋‘ ๋ถ„ํฌ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ์ง€ ์ธก์ •ํ•  ์ˆ˜ ์žˆ์Œ

D_{KL}(P \parallel Q) = E_{x \sim P(X)} \left[ \log \frac{P(x)}{Q(x)} \right] $$ $$= E_{x \sim P(X)} [\log P(x) - \log Q(x)]

  • KL divergence๋Š” (Non-negative, ์Œ์ˆ˜๊ฐ€ ์•„๋‹˜)
  • PPP์™€ QQQ๊ฐ€ ๋™์ผํ•œ ๋ถ„ํฌ์ธ ๊ฒฝ์šฐ์—๋งŒ KL divergence๋Š” 000์ž„
  • ์ข…์ข… ์ด ๋ถ„ํฌ๋“ค ์‚ฌ์ด์˜ ์ผ์ข…์˜ (Distance, ๊ฑฐ๋ฆฌ)๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๊ฐœ๋…ํ™”๋˜์ง€๋งŒ, ๋Œ€์นญ์ ์ด์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— (DKL(PโˆฅQ)โ‰ DKL(QโˆฅP)D_{KL}(P \parallel Q) \neq D_{KL}(Q \parallel P)DKLโ€‹(PโˆฅQ)๎€ =DKLโ€‹(QโˆฅP)) ์ง„์ •ํ•œ ๊ฑฐ๋ฆฌ ์ธก์ •์€ ์•„๋‹˜

Cross-Entropy

  • KL divergence์™€ ๋ฐ€์ ‘ํ•˜๊ฒŒ ๊ด€๋ จ๋œ ์–‘์€ (Cross-entropy) H(P,Q)H(P, Q)H(P,Q)์ด๋ฉฐ, ์ด๋Š” H(P,Q)=H(P)+DKL(PโˆฅQ)H(P, Q) = H(P) + D_{KL}(P \parallel Q)H(P,Q)=H(P)+DKLโ€‹(PโˆฅQ)๋กœ ์ •์˜๋˜์ง€๋งŒ, ๋‹ค์Œ์˜ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง

H(P,Q)=โˆ’ExโˆผP(X)[logโกQ(x)]H(P, Q) = -E_{x \sim P(X)} [\log Q(x)] H(P,Q)=โˆ’ExโˆผP(X)โ€‹[logQ(x)]

  • QQQ์— ๋Œ€ํ•ด Cross-entropy๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์€ KL divergence๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ๊ณผ (Equivalent, ๋™๋“ฑ)ํ•จ. ์ด๋Š” QQQ๊ฐ€ ์ƒ๋žต๋œ ํ•ญ์— ์ฐธ์—ฌํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž„
  • PPP์™€ QQQ ์‚ฌ์ด์˜ Cross-entropy๋Š” ์ฝ”๋”ฉ (Scheme, ๋ฐฉ์‹)์ด PPP๊ฐ€ ์•„๋‹Œ QQQ์— ์ตœ์ ํ™”๋œ ๊ฒฝ์šฐ, ์‚ฌ๊ฑด์„ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ํ‰๊ท  ๋น„ํŠธ ์ˆ˜๋ฅผ ์ธก์ •ํ•จ
  • PPP์™€ QQQ: ๋™์ผํ•œ (Underlying set of events, ๊ธฐ๋ณธ ์‚ฌ๊ฑด ์ง‘ํ•ฉ)์— ๋Œ€ํ•œ ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ. PPP๋Š” (True distribution, ์‹ค์ œ ๋ถ„ํฌ)์ด๊ณ , QQQ๋Š” (Estimated probability distribution, ์ถ”์ •๋œ ํ™•๋ฅ  ๋ถ„ํฌ)์ž„
  • (Additional info, ์ถ”๊ฐ€ ์ •๋ณด) ๋˜ํ•œ (Maximum likelihood estimation, ์ตœ๋Œ€ ์šฐ๋„ ์ถ”์ •) (MLE)์™€ ๊ด€๋ จ์ด ์žˆ์œผ๋ฉฐ, (Training neural models, ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ ํ•™์Šต)์„ ์œ„ํ•œ ์ฃผ์š” (Objectives, ๋ชฉํ‘œ) ์ค‘ ํ•˜๋‚˜๋กœ ๊ฐ„์ฃผ๋จ

Probabilistic Agents

  • (Real world, ํ˜„์‹ค ์„ธ๊ณ„)์˜ (Agents, ์—์ด์ „ํŠธ)๋Š” (Partial observability, ๋ถ€๋ถ„ ๊ด€์ฐฐ ๊ฐ€๋Šฅ์„ฑ), (Nondeterminism, ๋น„๊ฒฐ์ •๋ก ), ๋˜๋Š” (Adversaries, ์ ๋Œ€์ž)๋กœ ์ธํ•ด (Uncertainty, ๋ถˆํ™•์‹ค์„ฑ)์„ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•จ
  • ์—์ด์ „ํŠธ๋Š” ์ž์‹ ์ด ์ง€๊ธˆ ์–ด๋–ค ์ƒํƒœ์— ์žˆ๋Š”์ง€, ๋˜๋Š” ์ผ๋ จ์˜ ํ–‰๋™ ํ›„์— ์–ด๋””์— ์žˆ๊ฒŒ ๋ ์ง€ ํ™•์‹ ํ•  ์ˆ˜ ์—†์„ ์ˆ˜ ์žˆ์Œ
  • ์˜ˆ: ์น˜๊ณผ ํ™˜์ž์˜ ์น˜ํ†ต ์ง„๋‹จ
  • ์ง„๋‹จ์€ ๊ฑฐ์˜ ํ•ญ์ƒ ๋ถˆํ™•์‹ค์„ฑ์„ ํฌํ•จํ•จ
  • (Propositional logic, ๋ช…์ œ ๋…ผ๋ฆฌ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์น˜๊ณผ ์ง„๋‹จ ๊ทœ์น™์„ ์ž‘์„ฑํ•˜๋ ค๊ณ  ์‹œ๋„: Toothache\text{Toothache}Toothache โ‡’\Rightarrowโ‡’ Cavity\text{Cavity}Cavity
  • ์ด ๊ทœ์น™์€ ํ‹€๋ฆผ. ์น˜ํ†ต ํ™˜์ž ๋ชจ๋‘๊ฐ€ ์ถฉ์น˜๋ฅผ ๊ฐ€์ง„ ๊ฒƒ์€ ์•„๋‹ˆ๋ฉฐ, ์ผ๋ถ€๋Š” ์ž‡๋ชธ ์งˆํ™˜, ๋†์–‘ ๋“ฑ์„ ๊ฐ€์ง: Toothache\text{Toothache}Toothache โ‡’\Rightarrowโ‡’ Cavity\text{Cavity}Cavity โˆจ\lorโˆจ GumProblem\text{GumProblem}GumProblem โˆจ\lorโˆจ Abscess\text{Abscess}Abscess โ€ฆ\dotsโ€ฆ
  • ์ด ๊ทœ์น™์„ ์ฐธ์œผ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ฑฐ์˜ ๋ฌด์ œํ•œ์˜ ๊ฐ€๋Šฅํ•œ ๋ฌธ์ œ ๋ชฉ๋ก์„ ์ถ”๊ฐ€ํ•ด์•ผ ํ•จ
  • ๊ทœ์น™์„ (Causal rule, ์ธ๊ณผ ๊ด€๊ณ„ ๊ทœ์น™)์œผ๋กœ ๋ฐ”๊พธ๋ ค๊ณ  ์‹œ๋„: Cavity\text{Cavity}Cavity โ‡’\Rightarrowโ‡’ Toothache\text{Toothache}Toothache
  • ๊ทธ๋Ÿฌ๋‚˜ ์ด ๊ทœ์น™๋„ ์˜ณ์ง€ ์•Š์Œ. ๋ชจ๋“  ์ถฉ์น˜๊ฐ€ ํ†ต์ฆ์„ ์œ ๋ฐœํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹˜
  • ๊ทœ์น™์„ ์ˆ˜์ •ํ•˜๋Š” ์œ ์ผํ•œ ๋ฐฉ๋ฒ•์€ (Logically exhaustive, ๋…ผ๋ฆฌ์ ์œผ๋กœ ์™„์ „)ํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ: ์ถฉ์น˜๊ฐ€ ์น˜ํ†ต์„ ์œ ๋ฐœํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๋ชจ๋“  (Qualifications, ์กฐ๊ฑด)์œผ๋กœ ์ขŒ๋ณ€์„ ๋ณด๊ฐ•ํ•˜๋Š” ๊ฒƒ
  • ์˜๋ฃŒ ์˜์—ญ (์ด์ „ ์˜ˆ์‹œ์™€ ๊ฐ™์€) ๋˜๋Š” ๋งŽ์€ ๋‹ค๋ฅธ ์˜์—ญ์—์„œ, ์—์ด์ „ํŠธ์˜ ์ง€์‹์€ ๊ด€๋ จ (Sentences, ๋ฌธ์žฅ)์— ๋Œ€ํ•œ (Degree of belief, ๋ฏฟ์Œ์˜ ์ •๋„)๋งŒ์„ ์ตœ์„ ์œผ๋กœ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Œ
  • ๋ฏฟ์Œ์˜ ์ •๋„๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ ์ฃผ์š” (Tool, ๋„๊ตฌ)๋Š” (Probability theory, ํ™•๋ฅ  ์ด๋ก )์ž„
  • (Logical agent, ๋…ผ๋ฆฌ์  ์—์ด์ „ํŠธ)๋Š” ๊ฐ ๋ฌธ์žฅ์ด ์ฐธ ๋˜๋Š” ๊ฑฐ์ง“์ด๋ผ๊ณ  ๋ฏฟ๊ฑฐ๋‚˜ ์˜๊ฒฌ์ด ์—†๋Š” ๋ฐ˜๋ฉด, (Probabilistic agent, ํ™•๋ฅ ์  ์—์ด์ „ํŠธ)๋Š” 000 (ํ™•์‹คํžˆ ๊ฑฐ์ง“์ธ ๋ฌธ์žฅ)๊ณผ 111 (ํ™•์‹คํžˆ ์ฐธ์ธ ๋ฌธ์žฅ) ์‚ฌ์ด์˜ ์ˆ˜์น˜์ ์ธ ๋ฏฟ์Œ์˜ ์ •๋„๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Œ
  • ํŠน์ • ํ™˜์ž์—๊ฒŒ ๋ฌด์—‡์ด ๋ฌธ์ œ์ธ์ง€ ํ™•์‹คํžˆ ์•Œ์ง€๋Š” ๋ชปํ•˜์ง€๋งŒ, ์น˜ํ†ต์ด ์žˆ๋Š” ํ™˜์ž๊ฐ€ ์ถฉ์น˜๋ฅผ ๊ฐ€์งˆ ํ™•๋ฅ ์ด 80%80\%80%๋ผ๊ณ  ๋ฏฟ์„ ์ˆ˜ ์žˆ์Œ

Probabilistic Inference Using Full Joint Distributions

  • ๊ด€์‹ฌ ์žˆ๋Š” (Random variables)์— ๋Œ€ํ•œ (Full joint distribution, ์ „์ฒด ๊ฒฐํ•ฉ ๋ถ„ํฌ)๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ์ด๋ฅผ ๋ชจ๋“  ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต์„ ๋„์ถœํ•  ์ˆ˜ ์žˆ๋Š” "(Knowledge base, ์ง€์‹ ๋ฒ ์ด์Šค)"๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Œ
  • ๊ฐ„๋‹จํ•œ ์˜ˆ
  • ์„ธ ๊ฐ€์ง€ (Boolean variables, ๋ถˆ ๋ณ€์ˆ˜) Toothache\text{Toothache}Toothache, Cavity\text{Cavity}Cavity, ๋ฐ Catch\text{Catch}Catch๋กœ ๊ตฌ์„ฑ๋œ (Domain, ์˜์—ญ)
  • (Full joint distribution)์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Œ
  • ์ž„์˜์˜ ์‚ฌ๊ฑด AAA์— ๋Œ€ํ•ด P(A)=โˆ‘ฯ‰โˆˆAP(ฯ‰)P(A) = \sum_{\omega \in A} P(\omega)P(A)=โˆ‘ฯ‰โˆˆAโ€‹P(ฯ‰)์ž„์„ ์ƒ๊ธฐ. ฯ‰\omegaฯ‰๋Š” (Possible worlds, ๊ฐ€๋Šฅํ•œ ์„ธ๊ณ„) (Outcome)
  • ์ž„์˜์˜ ์‚ฌ๊ฑด์˜ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๋Š” ์ง์ ‘์ ์ธ ๋ฐฉ๋ฒ• ์ œ๊ณต.
    • ์‚ฌ๊ฑด์ด ์ฐธ์ธ ๊ฐ€๋Šฅํ•œ ์„ธ๊ณ„๋ฅผ ์‹๋ณ„ํ•˜๊ณ  ํ•ด๋‹น ํ™•๋ฅ ์„ ๋ชจ๋‘ ํ•ฉ์‚ฐ

    P(\text{\text{cavity}} \lor \text{\text{toothache}}) $$ $$= 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28

Marginalization and Conditioning

  • Marginalization (์ฃผ๋ณ€ํ™”) ๋ฐ Conditioning (์กฐ๊ฑดํ™”)
  • ์ผ๋ถ€ (Subset of variables, ๋ณ€์ˆ˜ ๋ถ€๋ถ„์ง‘ํ•ฉ)์— ๋Œ€ํ•œ ๋ถ„ํฌ๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์€ ์‹ค์ œ๋กœ (Marginal probability, ์ฃผ๋ณ€ ํ™•๋ฅ )์„ ๋„์ถœํ•˜๋Š” ๊ฒƒ๊ณผ ๋™๋“ฑํ•จ
  • ์˜ˆ: P(cavity)=0.108+0.012+0.072+0.008=0.2P(\text{\text{cavity}}) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2P(cavity)=0.108+0.012+0.072+0.008=0.2

P(Cavity)=P(Cavity,toothache,catch)+P(Cavity,toothache,ยฌcatch)P(\text{\text{Cavity}}) = P(\text{\text{Cavity}}, \text{\text{toothache}}, \text{catch}) + P(\text{\text{Cavity}}, \text{\text{toothache}}, \neg \text{catch}) P(Cavity)=P(Cavity,toothache,catch)+P(Cavity,toothache,ยฌcatch)

+P(Cavity,ยฌtoothache,catch)+P(Cavity,ยฌtoothache,ยฌcatch)+ P(\text{\text{Cavity}}, \neg \text{\text{toothache}}, \text{catch}) + P(\text{\text{Cavity}}, \neg \text{\text{toothache}}, \neg \text{catch}) +P(Cavity,ยฌtoothache,catch)+P(Cavity,ยฌtoothache,ยฌcatch)

  • P(X)=โˆ‘yP(X,Y=y)P(X) = \sum_{y} P(X, Y=y)P(X)=โˆ‘yโ€‹P(X,Y=y)
  • ์ด๋Š” Conditioning์œผ๋กœ๋„ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์Œ:

P(X)=โˆ‘yP(X,Y=y)=โˆ‘yP(XโˆฃY=y)P(Y=y)P(X) = \sum_{y} P(X, Y=y) = \sum_{y} P(X|Y=y) P(Y=y) P(X)=yโˆ‘โ€‹P(X,Y=y)=yโˆ‘โ€‹P(XโˆฃY=y)P(Y=y)

Conditional Probabilities & Normalization

  • Conditional probabilities (์กฐ๊ฑด๋ถ€ ํ™•๋ฅ )
  • ๋‹ค๋ฅธ ๋ณ€์ˆ˜์— ๋Œ€ํ•œ (Evidence, ์ฆ๊ฑฐ)๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์ผ๋ถ€ ๋ณ€์ˆ˜์˜ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ๋„ ๊ด€์‹ฌ์ด ์žˆ์Œ

P(\text{\text{cavity}}|\text{\text{toothache}}) = \frac{P(\text{\text{cavity}} \land \text{\text{toothache}})}{P(\text{\text{toothache}})} $$ $$= \frac{0.108 + 0.012 + 0.072 + 0.008}{0.108 + 0.012 + 0.016 + 0.064} = 0.6

P(\neg \text{\text{cavity}}|\text{\text{toothache}}) = \frac{P(\neg \text{\text{cavity}} \land \text{\text{toothache}})}{P(\text{\text{toothache}})} $$ $$= \frac{0.016 + 0.064}{0.108 + 0.012 + 0.016 + 0.064} = 0.4

  • Normalization (์ •๊ทœํ™”)
  • ์œ„์˜ ๋‘ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์˜ ํ•ฉ์€ 1.01.01.0์ด์–ด์•ผ ํ•จ
  • P(toothache)P(\text{\text{toothache}})P(toothache) ํ•ญ์€ ์ด ๋‘ ๊ณ„์‚ฐ ๋ชจ๋‘์˜ (Denominator, ๋ถ„๋ชจ)์— ์žˆ์Œ โ‡’\Rightarrowโ‡’ ์ด๋Š” ๋ถ„ํฌ P(Cavityโˆฃtoothache)P(\text{\text{Cavity}}|\text{\text{toothache}})P(Cavityโˆฃtoothache)์— ๋Œ€ํ•œ (Normalization constant, ์ •๊ทœํ™” ์ƒ์ˆ˜)๋กœ ๊ฐ„์ฃผ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ•ฉ์ด 111์ด ๋˜๋„๋ก ๋ณด์žฅ
  • ์ด๋Ÿฌํ•œ ์ƒ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐ ฮฑ\alphaฮฑ๋ฅผ ์‚ฌ์šฉ. ์ด ํ‘œ๊ธฐ๋ฒ•์œผ๋กœ, ์œ„์˜ ๋‘ ๋ฐฉ์ •์‹์„ ํ•˜๋‚˜๋กœ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ์Œ:

P(Cavityโˆฃtoothache)=ฮฑP(Cavity,toothache)P(\text{\text{Cavity}}|\text{\text{toothache}}) = \alpha P(\text{\text{Cavity}}, \text{\text{toothache}}) P(Cavityโˆฃtoothache)=ฮฑP(Cavity,toothache)

=ฮฑ[P(Cavity,toothache,catch)+P(Cavity,toothache,ยฌcatch)]= \alpha [P(\text{\text{Cavity}}, \text{\text{toothache}}, \text{catch}) + P(\text{\text{Cavity}}, \text{\text{toothache}}, \neg \text{catch})] =ฮฑ[P(Cavity,toothache,catch)+P(Cavity,toothache,ยฌcatch)]

=ฮฑโŸจ0.108,0.016โŸฉ+โŸจ0.012,0.064โŸฉ=ฮฑโŸจ0.12,0.08โŸฉ=โŸจ0.6,0.4โŸฉ= \alpha \langle 0.108, 0.016 \rangle + \langle 0.012, 0.064 \rangle = \alpha \langle 0.12, 0.08 \rangle = \langle 0.6, 0.4 \rangle =ฮฑโŸจ0.108,0.016โŸฉ+โŸจ0.012,0.064โŸฉ=ฮฑโŸจ0.12,0.08โŸฉ=โŸจ0.6,0.4โŸฉ

Normalization & General Inference Rule

  • Normalization
  • ๋‹ค์‹œ ๋งํ•ด, P(toothache)P(\text{\text{toothache}})P(toothache) ๊ฐ’์„ ๋ชฐ๋ผ๋„ P(Cavityโˆฃtoothache)P(\text{\text{Cavity}}|\text{\text{toothache}})P(Cavityโˆฃtoothache)๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Œ
  • Normalization์€ ๊ณ„์‚ฐ์„ ๋” ์‰ฝ๊ฒŒ ๋งŒ๋“ค๊ณ , ์ผ๋ถ€ ํ™•๋ฅ  ํ‰๊ฐ€ (P(toothache)P(\text{\text{toothache}})P(toothache)์™€ ๊ฐ™์€)๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์„ ๋•Œ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๋“ฑ ๋งŽ์€ ํ™•๋ฅ  ๊ณ„์‚ฐ์—์„œ ์œ ์šฉํ•œ (Shortcut, ๋‹จ์ถ•ํ‚ค)๊ฐ€ ๋จ
  • (General inference rule, ์ผ๋ฐ˜ ์ถ”๋ก  ๊ทœ์น™)
  • ๋‹จ์ผ ๋ณ€์ˆ˜ XXX, (Evidence variables, ์ฆ๊ฑฐ ๋ณ€์ˆ˜ ๋ชฉ๋ก) EEE, EEE์— ๋Œ€ํ•œ (Observed values, ๊ด€์ฐฐ๋œ ๊ฐ’ ๋ชฉ๋ก) eee, ๋ฐ (Remaining unobserved variables, ๋‚˜๋จธ์ง€ ๊ด€์ฐฐ๋˜์ง€ ์•Š์€ ๋ณ€์ˆ˜) YYY๊ฐ€ ์ฃผ์–ด์ง€๋ฉด,
  • ํ™•๋ฅ  ๋ถ„ํฌ P(Xโˆฃe)P(X|e)P(Xโˆฃe)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋  ์ˆ˜ ์žˆ์Œ

P(Xโˆฃe)=ฮฑP(X,e)=ฮฑโˆ‘yP(X,e,y)P(X|e) = \alpha P(X, e) = \alpha \sum_{y} P(X, e, y) P(Xโˆฃe)=ฮฑP(X,e)=ฮฑyโˆ‘โ€‹P(X,e,y)

  • ๋ณ€์ˆ˜ XXX, EEE, ๋ฐ YYY๋Š” ํ•จ๊ป˜ Domain์— ๋Œ€ํ•œ (Complete set of variables, ์™„์ „ํ•œ ๋ณ€์ˆ˜ ์ง‘ํ•ฉ)์„ ๊ตฌ์„ฑํ•˜๋ฏ€๋กœ, P(X,e,y)P(X, e, y)P(X,e,y)๋Š” ๋‹จ์ˆœํžˆ (Full joint distribution)์—์„œ ์˜จ ํ™•๋ฅ ๋“ค์˜ (Subset, ๋ถ€๋ถ„์ง‘ํ•ฉ)์ž„

Conclusions & Limitations - ์š”์•ฝ

  • (Full joint distribution)์ด ์ฃผ์–ด์ง€๋ฉด, (Discrete variables, ์ด์‚ฐ ๋ณ€์ˆ˜)์— ๋Œ€ํ•œ ํ™•๋ฅ ์  (Queries, ์งˆ์˜)์— ๋‹ตํ•  ์ˆ˜ ์žˆ์Œ
  • ๊ทธ๋Ÿฌ๋‚˜, (Scale well, ํ™•์žฅ์„ฑ์ด ์ข‹์ง€ ์•Š์Œ). nnn๊ฐœ์˜ (Boolean variables)๋กœ ์„ค๋ช…๋˜๋Š” (Domain)์˜ ๊ฒฝ์šฐ, O(2n)O(2^n)O(2n) ํฌ๊ธฐ์˜ (Input table, ์ž…๋ ฅ ํ…Œ์ด๋ธ”)์ด ํ•„์š”ํ•จ
  • ๋”ฐ๋ผ์„œ, (Full joint distribution) (ํ‘œ ํ˜•์‹)์€ (Reasoning systems, ์ถ”๋ก  ์‹œ์Šคํ…œ)์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ์‹ค์šฉ์ ์ธ (Tool, ๋„๊ตฌ)๊ฐ€ ์•„๋‹˜
  • ๋‹ค์Œ ๋‹จ๊ณ„๋Š” (Chain rule, ์—ฐ์‡„ ๋ฒ•์น™) ๋ฐ (Concept of independence and conditional independence, ๋…๋ฆฝ ๋ฐ ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ ๊ฐœ๋…)์— ์˜์กดํ•˜์—ฌ, (Joint distribution, ๊ฒฐํ•ฉ ๋ถ„ํฌ)๋ฅผ (Subsets of variables, ๋ณ€์ˆ˜์˜ ๋ถ€๋ถ„ ์ง‘ํ•ฉ)์— ๋Œ€ํ•œ ๋” ๊ฐ„๋‹จํ•œ (Probabilitiy distributions)์˜ ๊ณฑ์œผ๋กœ (Factorize, ์ธ์ˆ˜๋ถ„ํ•ด)ํ•˜๋Š” ๊ฒƒ
  • Chain rule: P(A,B,C)=P(A)P(BโˆฃA)P(CโˆฃA,B)P(A, B, C) = P(A) P(B|A) P(C|A, B)P(A,B,C)=P(A)P(BโˆฃA)P(CโˆฃA,B)
  • Independence: P(A,B)=P(A)P(B)P(A, B) = P(A) P(B)P(A,B)=P(A)P(B), P(AโˆฃB)=P(A)P(A|B) = P(A)P(AโˆฃB)=P(A)
  • Conditional independence: P(A,BโˆฃC)=P(AโˆฃC)P(BโˆฃC)P(A, B|C) = P(A|C) P(B|C)P(A,BโˆฃC)=P(AโˆฃC)P(BโˆฃC), P(AโˆฃB,C)=P(AโˆฃC)P(A|B, C) = P(A|C)P(AโˆฃB,C)=P(AโˆฃC)
  • (Bayesian network, ๋ฒ ์ด์ฆˆ ๋„คํŠธ์›Œํฌ)๋Š” ์ด๋Ÿฌํ•œ (Factorization)์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๊ฒŒ ํ•จ

Bayesian Network - ์š”์•ฝ

  • (Full joint probability distribution)์ด (Domain)์— ๋Œ€ํ•œ ๋ชจ๋“  ์งˆ๋ฌธ์— ๋‹ตํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์•˜์ง€๋งŒ, ๋ณ€์ˆ˜์˜ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ (Intractably large, ๋‹ค๋ฃจ๊ธฐ ํž˜๋“ค ์ •๋„๋กœ ์ปค์ง)
  • ๋˜ํ•œ, (Conditional) independence๊ฐ€ (Full distribution)์„ ์ •์˜ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ํ™•๋ฅ ์˜ ์ˆ˜๋ฅผ ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์•˜์Œ
  • Bayesian network๋Š” ๋ณ€์ˆ˜๋“ค ์‚ฌ์ด์˜ (Dependencies, ์ข…์†์„ฑ)์„ ๋‚˜ํƒ€๋‚ด๋Š” (Data structure, ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ)์ž„
  • Bayesian networks๋Š” ๋ณธ์งˆ์ ์œผ๋กœ ๋ชจ๋“  (Full joint probability distribution)์„ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Œ
  • Bayesian network๋Š” ๊ฐ ๋…ธ๋“œ์— (Quantitative probability information, ์ •๋Ÿ‰์  ํ™•๋ฅ  ์ •๋ณด)๊ฐ€ ์ฃผ์„์œผ๋กœ ๋‹ฌ๋ ค ์žˆ๋Š” (Directed graph, ๋ฐฉํ–ฅ์„ฑ ๊ทธ๋ž˜ํ”„)์ž„
  • Bayesian networks (Bayes net)๋Š” 1980๋…„๋Œ€์™€ 1990๋…„๋Œ€์— (Belief networks, ์‹ ๋… ๋„คํŠธ์›Œํฌ)๋ผ๊ณ  ๋ถˆ๋ ธ์Œ
  • (Probabilistic) graphical model (PGM) ์šฉ์–ด๋Š” Bayesian networks๋ฅผ ํฌํ•จํ•˜๋Š” ๋” ๋„“์€ (Class, ๋ฒ”์ฃผ)๋ฅผ ์ง€์นญํ•จ
  • Bayesian networks์˜ (Full specification, ์ „์ฒด ๋ช…์„ธ)
  • ๊ฐ ๋…ธ๋“œ๋Š” (Random variable) (์ด์‚ฐ ๋˜๋Š” ์—ฐ์†)์— ํ•ด๋‹นํ•จ
  • (Directed links, ๋ฐฉํ–ฅ์„ฑ ๋งํฌ) ๋˜๋Š” ํ™”์‚ดํ‘œ๊ฐ€ ๋…ธ๋“œ ์Œ์„ ์—ฐ๊ฒฐํ•จ. ๋…ธ๋“œ XXX์—์„œ ๋…ธ๋“œ YYY๋กœ ํ™”์‚ดํ‘œ๊ฐ€ ์žˆ์œผ๋ฉด, XXX๋Š” YYY์˜ (Parent, ๋ถ€๋ชจ)๋ผ๊ณ  ํ•จ. ๊ทธ๋ž˜ํ”„๋Š” (Directed cycles, ๋ฐฉํ–ฅ์„ฑ ์ˆœํ™˜)์ด ์—†์œผ๋ฏ€๋กœ (Directed acylic graph, ๋ฐฉํ–ฅ์„ฑ ๋น„์ˆœํ™˜ ๊ทธ๋ž˜ํ”„) (DAG)์ž„
  • ๊ฐ ๋…ธ๋“œ XiX_iXiโ€‹๋Š” ๋ถ€๋ชจ๊ฐ€ ๋…ธ๋“œ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ๊ด€๋ จ ํ™•๋ฅ  ์ •๋ณด P(XiโˆฃParents(Xi))P(X_i|\text{Parents}(X_i))P(Xiโ€‹โˆฃParents(Xiโ€‹))๋ฅผ ๊ฐ€์ง
  • Properties (์†์„ฑ)
  • (Network)์˜ (Topology, ํ† ํด๋กœ์ง€)๋Š” (Domain)์—์„œ ์„ฑ๋ฆฝํ•˜๋Š” (Conditional independence relationships, ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ ๊ด€๊ณ„)๋ฅผ ๋ช…์‹œํ•จ
  • ํ™”์‚ดํ‘œ์˜ (Intuitive meaning, ์ง๊ด€์ ์ธ ์˜๋ฏธ)๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ XXX๊ฐ€ YYY์— (Direct influence, ์ง์ ‘์ ์ธ ์˜ํ–ฅ)์„ ๋ฏธ์นœ๋‹ค๋Š” ๊ฒƒ์ด๋ฉฐ, ์ด๋Š” (Causes, ์›์ธ)์ด (Effects, ๊ฒฐ๊ณผ)์˜ (Parents)์—ฌ์•ผ ํ•จ์„ ์‹œ์‚ฌํ•จ
  • Bayes net์˜ (Topology)๊ฐ€ ๊ฒฐ์ •๋˜๋ฉด, ๊ฐ ๋ณ€์ˆ˜์— ๋Œ€ํ•œ (Local probability information, ์ง€์—ญ ํ™•๋ฅ  ์ •๋ณด)๋งŒ ์ง€์ •ํ•˜๋ฉด ๋จ
  • (Full joint distribution)์€ (Topology)์™€ (Local information)์— ์˜ํ•ด ์ •์˜๋จ

Conditional Probability Table (CPT)

  • (Conditional Probability Tables, ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ํ‘œ) (CPT)๋Š” (Discrete variables)์— ๋Œ€ํ•œ (Local probability information)์„ ๋‚˜ํƒ€๋ƒ„
  • CPT์˜ ๊ฐ ํ–‰์€ (Conditioning case, ์กฐ๊ฑดํ™” ์‚ฌ๋ก€)์— ๋Œ€ํ•œ ๊ฐ ๋…ธ๋“œ ๊ฐ’์˜ (Conditional probability, ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ )์„ ํฌํ•จํ•จ
  • (Conditioning case)๋Š” (Parent nodes)์— ๋Œ€ํ•œ ๊ฐ’์˜ ๊ฐ€๋Šฅํ•œ (Combination, ์กฐํ•ฉ)์ž„
  • ๊ฐ ํ–‰์˜ ํ•ฉ์€ 111์ด์–ด์•ผ ํ•จ. ๊ทธ๋Ÿฌ๋‚˜ (Boolean variables)์˜ ๊ฒฝ์šฐ, ์ข…์ข… ๋‘ ๋ฒˆ์งธ ์ˆซ์ž๋Š” ์ƒ๋žตํ•จ

The Semantics of Bayesian Networks

  • (Random variables) X1,โ€ฆ,XnX_1, \dots, X_nX1โ€‹,โ€ฆ,Xnโ€‹์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•จ
  • ๊ทธ๋Ÿฌ๋ฉด (Joint distribution, ๊ฒฐํ•ฉ ๋ถ„ํฌ)๋Š” P(X1=x1โˆงโ‹ฏโˆงXn=xn)P(X_1 = x_1 \land \dots \land X_n = x_n)P(X1โ€‹=x1โ€‹โˆงโ‹ฏโˆงXnโ€‹=xnโ€‹), ๋˜๋Š” ๊ฐ„๋‹จํžˆ P(x1,โ€ฆ,xn)P(x_1, \dots, x_n)P(x1โ€‹,โ€ฆ,xnโ€‹)์ž„
  • Bayesian networks๋Š” (Joint distribution)์˜ ๊ฐ (Entry, ํ•ญ๋ชฉ)์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•จ:

P(x1,โ€ฆ,xn)=โˆi=1nP(xiโˆฃParents(Xi))P(x_1, \dots, x_n) = \prod_{i=1}^n P(x_i|\text{Parents}(X_i)) P(x1โ€‹,โ€ฆ,xnโ€‹)=i=1โˆnโ€‹P(xiโ€‹โˆฃParents(Xiโ€‹))

  • ๋”ฐ๋ผ์„œ, (Joint distribution)์˜ ๊ฐ ํ•ญ๋ชฉ์€ Bayes net์˜ (Local conditional distributions, ์ง€์—ญ ์กฐ๊ฑด๋ถ€ ๋ถ„ํฌ)์˜ ์ ์ ˆํ•œ ์š”์†Œ๋“ค์˜ ๊ณฑ์œผ๋กœ ํ‘œํ˜„๋จ
  • ์˜ˆ: ๊ฒฝ๋ณด๊ฐ€ ์šธ๋ ธ์ง€๋งŒ, ๊ฐ•๋„๋‚˜ ์ง€์ง„์€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜๊ณ , John๊ณผ Mary ๋ชจ๋‘ ์ „ํ™”ํ•œ ํ™•๋ฅ 

P(j,m,a,ยฌb,ยฌe)=P(jโˆฃa)P(mโˆฃa)P(aโˆฃยฌb,ยฌe)P(ยฌb)P(ยฌe)P(j, m, a, \neg b, \neg e) = P(j|a) P(m|a) P(a|\neg b, \neg e) P(\neg b) P(\neg e) P(j,m,a,ยฌb,ยฌe)=P(jโˆฃa)P(mโˆฃa)P(aโˆฃยฌb,ยฌe)P(ยฌb)P(ยฌe)

=0.90ร—0.70ร—0.001ร—0.999ร—0.998=0.000628= 0.90 \times 0.70 \times 0.001 \times 0.999 \times 0.998 = 0.000628 =0.90ร—0.70ร—0.001ร—0.999ร—0.998=0.000628

Relationship between Chain Rule and Bayes Net

  • Chain rule

P(x1,โ€ฆ,xn)=P(xnโˆฃxnโˆ’1,โ€ฆ,x1)P(xnโˆ’1โˆฃxnโˆ’2,โ€ฆ,x1)โ‹ฏP(x2โˆฃx1)P(x1)P(x_1, \dots, x_n) = P(x_n|x_{n-1}, \dots, x_1) P(x_{n-1}|x_{n-2}, \dots, x_1) \cdots P(x_2|x_1) P(x_1) P(x1โ€‹,โ€ฆ,xnโ€‹)=P(xnโ€‹โˆฃxnโˆ’1โ€‹,โ€ฆ,x1โ€‹)P(xnโˆ’1โ€‹โˆฃxnโˆ’2โ€‹,โ€ฆ,x1โ€‹)โ‹ฏP(x2โ€‹โˆฃx1โ€‹)P(x1โ€‹)

=โˆi=1nP(xiโˆฃxiโˆ’1,โ€ฆ,x1)= \prod_{i=1}^n P(x_i|x_{i-1}, \dots, x_1) =i=1โˆnโ€‹P(xiโ€‹โˆฃxiโˆ’1โ€‹,โ€ฆ,x1โ€‹)

  • Bayesian networks

P(x1,โ€ฆ,xn)=โˆi=1nP(xiโˆฃParents(Xi))P(x_1, \dots, x_n) = \prod_{i=1}^n P(x_i|\text{Parents}(X_i)) P(x1โ€‹,โ€ฆ,xnโ€‹)=i=1โˆnโ€‹P(xiโ€‹โˆฃParents(Xiโ€‹))

  • P(xiโˆฃxiโˆ’1,โ€ฆ,x1)=P(xiโˆฃParents(Xi))P(x_i|x_{i-1}, \dots, x_1) = P(x_i|\text{Parents}(X_i))P(xiโ€‹โˆฃxiโˆ’1โ€‹,โ€ฆ,x1โ€‹)=P(xiโ€‹โˆฃParents(Xiโ€‹))์ด๋ฉฐ, ์—ฌ๊ธฐ์„œ Parents(Xi)โІ{Xiโˆ’1,โ€ฆ,X1}\text{Parents}(X_i) \subseteq \{X_{i-1}, \dots, X_1\}Parents(Xiโ€‹)โІ{Xiโˆ’1โ€‹,โ€ฆ,X1โ€‹}์ž„
  • Parents(Xi)โІ{Xiโˆ’1,โ€ฆ,X1}\text{Parents}(X_i) \subseteq \{X_{i-1}, \dots, X_1\}Parents(Xiโ€‹)โІ{Xiโˆ’1โ€‹,โ€ฆ,X1โ€‹}๋Š” ๋…ธ๋“œ๋ฅผ (Topological order, ์œ„์ƒ ์ˆœ์„œ)๋กœ ๋ฒˆํ˜ธ ๋งค๊ธฐ๋ฉด ๋งŒ์กฑ๋จ
  • ์ฆ‰, (Directed graph structure, ๋ฐฉํ–ฅ์„ฑ ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ)์™€ ์ผ์น˜ํ•˜๋Š” ์ž„์˜์˜ ์ˆœ์„œ๋กœ
์ตœ๊ทผ ์ˆ˜์ •: 25. 11. 6. ์˜คํ›„ 12:07
Contributors: kmbzn
Prev
6. Probability
Next
8. Probabilitc Reasoning (2)

BUILT WITH

CloudflareNode.jsGitHubGitVue.jsJavaScriptVSCodenpm

All trademarks and logos are property of their respective owners.
ยฉ 2025 kmbzn ยท MIT License