• Mindscape ๐Ÿ”ฅ
    • Playlist ๐ŸŽง
  • ๐Ÿค– Artifical Intelligence

    • 1. Basics; Linear Algebra
    • 2. Basics; Linear Algebra (2), Search (1)
    • 3. Search (2)
    • 4. Knowledge and Logic (1)
    • 5. Knowledge and Logic (2)
    • 6. Probability
    • 7. Information Theory
    • 8. Probabilitc Reasoning (2)
    • 9. Probabilitc Reasoning (3)
    • 10. Machine Learning (1)
    • 11. Machine Learning (2)
    • 12. Machine Learning (3)
    • 13. Linear Models
    • 14. Other Classic ML Models (1)
    • 15. Other Classic ML Models (2)
  • ๐Ÿ”’ Computer Security

    • 01. Overview
    • 02. ์ •๋ณด๋ณด์•ˆ์ •์ฑ… ๋ฐ ๋ฒ•๊ทœ
    • 03. Cryptographic Tools
    • 04. User Authentication
    • 05. Access Control
    • 06. Database Security
    • 07. Malicious Software
    • 08. Firmware Analysis
  • ๐Ÿ—„๏ธ Database System

    • 1. Introduction
    • 2. Relational Model
    • 3. SQL
    • 6. E-R Model
    • 7. Relational Database Design (1)
    • 7. Relational Database Design (2)
    • 13. Data Storage Structures
    • 14. Indexing
    • 15. Query Processing
  • ๐Ÿ“ Software Engineering

    • 2. Introduction to Software Engineering
    • 3. Process
    • 4. Process Models
    • 5. Agile
    • 6. Requirements
    • 7. Requirements Elicitation and Documentation
    • 8. Architecture
    • 9. Unified Modelling Language
    • 10. Object-Oriented Analysis
    • Object-Oriented Design
  • ๐Ÿง  Algorithm

    • Python ์‹œ๊ฐ„ ์ดˆ๊ณผ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ํŒ
    • C++ std::vector ์‚ฌ์šฉ๋ฒ• ์ •๋ฆฌ
    • Vim ์‚ฌ์šฉ ๋งค๋‰ด์–ผ
    • 1018๋ฒˆ: ์ฒด์ŠคํŒ ๋‹ค์‹œ ์น ํ•˜๊ธฐ
    • 1966๋ฒˆ: ํ”„๋ฆฐํ„ฐ ํ

14. Other Classic ML Models (1)

Naรฏve Bayes

Naรฏve Bayes Models

  • ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ์•„๋งˆ๋„ ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ด๊ณ  ๋‹จ์ˆœํ•œ Bayesian network ๋ชจ๋ธ
  • Naรฏve Bayes(๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ) ๋ชจ๋ธ์€ ํด๋ž˜์Šค ๋ณ€์ˆ˜(class variable)๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ํŠน์ • ํŠน์„ฑ์˜ ๊ฐ’์ด ๋‹ค๋ฅธ ํŠน์„ฑ์˜ ๊ฐ’๊ณผ (์กฐ๊ฑด๋ถ€์ ์œผ๋กœ) independentํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•จ.
  • ์‹ค์ œ๋กœ๋Š” ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ ๊ฐ€์ •์ด ์—„๊ฒฉํ•˜๊ฒŒ ์‚ฌ์‹ค์ด ์•„๋‹ˆ๋”๋ผ๋„ ์ž˜ ์ž‘๋™ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Œ.

Text Classification with Naรฏve Bayes

  • ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ์ž‘์—…์— Naรฏve Bayes ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
  • ์ž‘์—…: ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ๊ฐ€ ์–ด๋–ค ํด๋ž˜์Šค(class)๋‚˜ ์นดํ…Œ๊ณ ๋ฆฌ(\text{category})์— ์†ํ•˜๋Š”์ง€ ๊ฒฐ์ •
  • "์›์ธ" = Category\text{Category}Category ๋ณ€์ˆ˜, "๊ฒฐ๊ณผ" ๋ณ€์ˆ˜ = ํŠน์ • ํ‚ค์›Œ๋“œ์˜ ์กด์žฌ ์—ฌ๋ถ€ (HasWordi\text{HasWord}_iHasWordiโ€‹)
  • ์˜ˆ์ œ ๋ฌธ์žฅ (์‹ ๋ฌธ ๊ธฐ์‚ฌ)
    • โ€œStocks rallied on Monday, with major indexes gaining 1% as optimism persisted over the first quarter earnings season.โ€ (business)
    • โ€œHeavy rain continued to pound much of the east coast on Monday, with flood warnings issued in New York City and other locations.โ€ (weather)
  • ์ž‘์—…: ๊ฐ ๋ฌธ์žฅ์„ ์ฃผ์š” ์„น์…˜ (news, sports, business, weather, or entertainment)์œผ๋กœ ๋ถ„๋ฅ˜
  • Naรฏve Bayes ๋ชจ๋ธ์€ ์‚ฌ์ „ ํ™•๋ฅ (prior probabilities) P(Category)P(\text{Category})P(Category)์™€ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ (conditional probabilities) P(HasWordiย โˆฃย Category)P(\text{HasWord}_i~|~\text{Category})P(HasWordiโ€‹ย โˆฃย Category)๋กœ ๊ตฌ์„ฑ๋จ.
  • ์‚ฌํ›„ ํ™•๋ฅ (Posterior probability)์€ ์‚ฌ์ „ ํ™•๋ฅ ๊ณผ likelihood(๊ฐ€๋Šฅ๋„) (์กฐ๊ฑด๋ถ€ ํ™•๋ฅ )์˜ ๊ณฑ์— ๋น„๋ก€

P(CategoryโˆฃHasWordw1,ย HasWordw2,ย โ€ฆ,ย HasWordwn)=P(HasWordw1,ย HasWordw2,ย โ€ฆ,ย HasWordwnโˆฃCategory)P(Category)P(HasWordw1,ย HasWordw2,ย โ€ฆ,ย HasWordwn)โˆP(HasWordw1,ย HasWordw2,ย โ€ฆ,ย HasWordwnโˆฃCategory)P(Category)=P(HasWordw1โˆฃCategory)โ‹ฏP(HasWordwnโˆฃCategory)P(Category)=โˆiP(HasWordwiโˆฃCategory)P(Category)\begin{aligned} & P(\text{Category} | \text{HasWord}_{w1},~ \text{HasWord}_{w2},~ \dots,~ \text{HasWord}_{wn}) \\ & = \frac{P(\text{HasWord}_{w1},~ \text{HasWord}_{w2},~ \dots,~ \text{HasWord}_{wn} | \text{Category}) P(\text{Category})}{P(\text{HasWord}_{w1},~ \text{HasWord}_{w2},~ \dots,~ \text{HasWord}_{wn})} \\ & \propto P(\text{HasWord}_{w1},~ \text{HasWord}_{w2},~ \dots,~ \text{HasWord}_{wn} | \text{Category}) P(\text{Category}) \\ & = P(\text{HasWord}_{w1} | \text{Category}) \cdots P(\text{HasWord}_{wn} | \text{Category}) P(\text{Category}) \\ & = \prod_{i} P(\text{HasWord}_{wi} | \text{Category}) P(\text{Category}) \end{aligned} โ€‹P(CategoryโˆฃHasWordw1โ€‹,ย HasWordw2โ€‹,ย โ€ฆ,ย HasWordwnโ€‹)=P(HasWordw1โ€‹,ย HasWordw2โ€‹,ย โ€ฆ,ย HasWordwnโ€‹)P(HasWordw1โ€‹,ย HasWordw2โ€‹,ย โ€ฆ,ย HasWordwnโ€‹โˆฃCategory)P(Category)โ€‹โˆP(HasWordw1โ€‹,ย HasWordw2โ€‹,ย โ€ฆ,ย HasWordwnโ€‹โˆฃCategory)P(Category)=P(HasWordw1โ€‹โˆฃCategory)โ‹ฏP(HasWordwnโ€‹โˆฃCategory)P(Category)=iโˆโ€‹P(HasWordwiโ€‹โˆฃCategory)P(Category)โ€‹

  • Naรฏve Bayes ๊ฐ€์ •: ๊ฐ ๋‹จ์–ด์˜ ์กด์žฌ๋Š” ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ

๊ฐ ์นดํ…Œ๊ณ ๋ฆฌ ccc์— ๋Œ€ํ•ด

  • P(Category=c)P(\text{Category} = c)P(Category=c)๋Š” ์ด์ „์— ๋ณธ ๋ชจ๋“  ๋ฌธ์„œ ์ค‘ ์นดํ…Œ๊ณ ๋ฆฌ ccc์˜ ๋น„์œจ๋กœ ์ถ”์ •
  • ์˜ˆ: 9%๊ฐ€ ๋‚ ์”จ ๊ธฐ์‚ฌ๋ฉด P(Category=weather)=0.09P(\text{Category} = \text{weather}) = 0.09P(Category=weather)=0.09
  • P(HasWordiย โˆฃย Category)P(\text{HasWord}_i~|~\text{Category})P(HasWordiโ€‹ย โˆฃย Category)๋Š” ๊ฐ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋ฌธ์„œ ์ค‘ ๋‹จ์–ด iii๋ฅผ ํฌํ•จํ•˜๋Š” ๋น„์œจ๋กœ ์ถ”์ •
  • ์˜ˆ: ๋น„์ฆˆ๋‹ˆ์Šค ๊ธฐ์‚ฌ์˜ 37%๊ฐ€ "stocks"๋ฅผ ํฌํ•จํ•˜๋ฉด P(HasWord"stocks"=trueย โˆฃย Category=business)=0.37P(\text{HasWord}_{\text{"stocks"}} = \text{true}~|~\text{Category} = \text{business}) = 0.37P(HasWord"stocks"โ€‹=trueย โˆฃย Category=business)=0.37

์ƒˆ ๋ฌธ์„œ ๋ถ„๋ฅ˜

  • ํ‚ค์›Œ๋“œ ํ™•์ธ ํ›„, ๊ณต์‹์„ ์ ์šฉํ•˜์—ฌ ์นดํ…Œ๊ณ ๋ฆฌ์— ๋Œ€ํ•œ ์‚ฌํ›„ ํ™•๋ฅ  ๋ถ„ํฌ(posterior probability distribution)๋ฅผ ์–ป์Œ.
  • ํ•˜๋‚˜์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋งŒ ์˜ˆ์ธกํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ, ๊ฐ€์žฅ ๋†’์€ ์‚ฌํ›„ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ๊ฒƒ์„ ์„ ํƒ

Naรฏve Bayes ๋ชจ๋ธ์€ ๋‹จ์–ด๊ฐ€ ๋ฌธ์„œ ๋ฒ”์ฃผ์— ๋”ฐ๋ผ ๊ฒฐ์ •๋˜๋Š” ๋นˆ๋„๋กœ ๋ฌธ์„œ์—์„œ ๋…๋ฆฝ์ ์œผ๋กœ ๋ฐœ์ƒํ•œ๋‹ค๊ณ  ๊ฐ€์ •

  • ์ด ๋…๋ฆฝ ๊ฐ€์ •์€ ์‹ค์ œ๋กœ๋Š” ์œ„๋ฐฐ
  • ์˜ˆ: "first quarter" ๊ตฌ๋ฌธ์€ "first" ร—\timesร— "quater"๊ฐ€ ์ œ์•ˆํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋น„์ฆˆ๋‹ˆ์Šค (or ์Šคํฌ์ธ ) ๊ธฐ์‚ฌ์—์„œ ๋” ์ž์ฃผ ๋ฐœ์ƒ

์ด๋Ÿฌํ•œ ์˜ค๋ฅ˜์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ๊ฐ€๋Šฅํ•œ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ์ˆœ์œ„(ranking)๋Š” ์ข…์ข… ๋งค์šฐ ์ •ํ™•

  • Naรฏve Bayes ๋ชจ๋ธ์€ ์–ธ์–ด ๊ฒฐ์ •, ๋ฌธ์„œ ๊ฒ€์ƒ‰, ์ŠคํŒธ ํ•„ํ„ฐ๋ง(spam filtering) ๋ฐ ๊ธฐํƒ€ ๋ถ„๋ฅ˜ ์ž‘์—…์— ๋„๋ฆฌ ์‚ฌ์šฉ
  • ์‹ค์ œ ์‚ฌํ›„ ํ™•๋ฅ  ๊ฐ’์ด ์ค‘์š”ํ•œ ์ž‘์—…(์˜ˆ: ์˜๋ฃŒ ์ง„๋‹จ)์˜ ๊ฒฝ์šฐ, ์‹ ๊ฒฝ๋ง(neural networks)๊ณผ ๊ฐ™์€ ๋” ์ •๊ตํ•œ ๋ชจ๋ธ์„ ์„ ํ˜ธ

Decision Trees ๊ฒฐ์ • ๋‚˜๋ฌด?

  • Attribute values์˜ ๋ฒกํ„ฐ๋ฅผ ๋‹จ์ผ ์ถœ๋ ฅ ๊ฐ’("decision")์— ๋งคํ•‘ํ•˜๋Š” ํ•จ์ˆ˜์˜ ํ‘œํ˜„
  • ๊ฒฐ์ • ํŠธ๋ฆฌ(Decision tree)๋Š” root์—์„œ ์‹œ์ž‘ํ•˜์—ฌ leaf์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ์ผ๋ จ์˜ ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฒฐ์ •์— ๋„๋‹ฌ
  • ๋‚ด๋ถ€ ๋…ธ๋“œ(Internal node): ์ž…๋ ฅ ์†์„ฑ ์ค‘ ํ•˜๋‚˜์— ๋Œ€ํ•œ ํ…Œ์ŠคํŠธ
  • ๊ฐ€์ง€(Branches): ์†์„ฑ์˜ ๊ฐ€๋Šฅํ•œ ๊ฐ’
  • ๋ฆฌํ”„ ๋…ธ๋“œ(Leaf nodes): ํ•จ์ˆ˜๊ฐ€ ๋ฐ˜ํ™˜ํ•  ๊ฐ’ ์ง€์ •
  • ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ ๊ฐ’์€ ์ด์‚ฐํ˜•(discrete) ๋˜๋Š” ์—ฐ์†ํ˜•(continuous)์ผ ์ˆ˜ ์žˆ์Œ.
  • ํ˜„์žฌ: ์ด์‚ฐํ˜• ์ž…๋ ฅ, Boolean (true/false) ์ถœ๋ ฅ โ†’\rightarrowโ†’ Boolean (=binary) classification(๋ถˆ๋ฆฌ์–ธ (=์ด์ง„) ๋ถ„๋ฅ˜)
  • ํ‘œ๊ธฐ๋ฒ•(Notation): jjj๋Š” ์˜ˆ์ œ ์ธ๋ฑ์Šค (xj\mathbf{x}^jxj๋Š” jjj๋ฒˆ์งธ ์˜ˆ์ œ์˜ ์ž…๋ ฅ ๋ฒกํ„ฐ,~yjy^jyj๋Š” ์ถœ๋ ฅ),~xj,ix_{j,i}xj,iโ€‹๋Š” jjj๋ฒˆ์งธ ์˜ˆ์ œ์˜ iii๋ฒˆ์งธ ์†์„ฑ

Example Problem: Restaurant Waiting

  • ๋ฌธ์ œ: ์‹๋‹น์—์„œ ํ…Œ์ด๋ธ”์„ ๊ธฐ๋‹ค๋ฆด์ง€ ๊ฒฐ์ •ํ•˜๋Š” ๋ฌธ์ œ
  • ์ถœ๋ ฅ yyy: Boolean ๋ณ€์ˆ˜ WillWaitWillWaitWillWait (๊ธฐ๋‹ค๋ฆด ๊ฒฝ์šฐ true)
  • ์ž…๋ ฅ x\mathbf{x}x: 10๊ฐœ ์†์„ฑ ๊ฐ’์˜ ๋ฒกํ„ฐ (๋ชจ๋‘ ์ด์‚ฐํ˜• ๊ฐ’)
    1. AlternateAlternateAlternate: ๊ทผ์ฒ˜์— ์ ์ ˆํ•œ ๋Œ€์•ˆ ์‹๋‹น์ด ์žˆ๋Š”์ง€
    2. BarBarBar: ๊ธฐ๋‹ค๋ฆด ์ˆ˜ ์žˆ๋Š” ํŽธ์•ˆํ•œ ๋ฐ”(bar) ๊ณต๊ฐ„์ด ์žˆ๋Š”์ง€
    3. Fri/SatFri/SatFri/Sat: ๊ธˆ์š”์ผ ๋˜๋Š” ํ† ์š”์ผ์ธ์ง€ (true)
    4. HungryHungryHungry: ์ง€๊ธˆ ๋ฐฐ๊ฐ€ ๊ณ ํ”ˆ์ง€
    5. PatronsPatronsPatrons: ์‹๋‹น ์•ˆ ์‚ฌ๋žŒ ์ˆ˜ (NoneNoneNone, SomeSomeSome, FullFullFull)
    6. PricePricePrice: ์‹๋‹น ๊ฐ€๊ฒฉ๋Œ€ ($$$, $$$$, $$$$$)
    7. RainingRainingRaining: ๋ฐ–์— ๋น„๊ฐ€ ์˜ค๋Š”์ง€
    8. ReservationReservationReservation: ์˜ˆ์•ฝ์„ ํ–ˆ๋Š”์ง€
    9. TypeTypeType: ์‹๋‹น ์ข…๋ฅ˜ (French, Italian, Thai, or burger)
    10. WaitEstimateWaitEstimateWaitEstimate: ํ˜ธ์ŠคํŠธ์˜ ์˜ˆ์ƒ ๋Œ€๊ธฐ ์‹œ๊ฐ„ (0โ€“10, 10โ€“30, 30โ€“60, or >60 ๋ถ„)
  • 12๊ฐœ ์˜ˆ์ œ ์„ธํŠธ๊ฐ€ ๊ทธ๋ฆผ์— ํ‘œ์‹œ
    • ๋ฐ์ดํ„ฐ๊ฐ€ ๋งค์šฐ ์ ์Œ: 26ร—32ร—42=92162^6 \times 3^2 \times 4^2 = 921626ร—32ร—42=9216 ๊ฐ€์ง€ ๊ฐ€๋Šฅํ•œ ์ž…๋ ฅ ์กฐํ•ฉ ์ค‘ 12๊ฐœ๋งŒ ์ฃผ์–ด์ง
    • 12๊ฐœ์˜ ์ฆ๊ฑฐ(evidence)๋งŒ์œผ๋กœ ๋ˆ„๋ฝ๋œ 9,204๊ฐœ์˜ ์ถœ๋ ฅ ๊ฐ’์„ ์ถ”์ธกํ•ด์•ผ ํ•จ.
ExampleAltBarFriHunPatPriceRainResTypeEstWillWait
x1x_1x1โ€‹YesNoNoYesSome$$$NoYesFrench0โ€“10y1=Yesy_1 = \text{Yes}y1โ€‹=Yes
x2x_2x2โ€‹YesNoNoYesFull$NoNoThai30โ€“60y2=Noy_2 = \text{No}y2โ€‹=No
x3x_3x3โ€‹NoYesNoNoSome$NoNoBurger0โ€“10y3=Yesy_3 = \text{Yes}y3โ€‹=Yes
x4x_4x4โ€‹YesNoYesYesFull$$$YesNoThai10โ€“30y4=Yesy_4 = \text{Yes}y4โ€‹=Yes
x5x_5x5โ€‹YesNoYesNoFull$$$NoYesFrench>60y5=Noy_5 = \text{No}y5โ€‹=No
x6x_6x6โ€‹NoYesNoYesSome$$YesYesItalian0โ€“10y6=Yesy_6 = \text{Yes}y6โ€‹=Yes
x7x_7x7โ€‹NoYesNoNoNone$YesNoBurger0โ€“10y7=Noy_7 = \text{No}y7โ€‹=No
x8x_8x8โ€‹NoNoNoYesSome$$YesYesThai0โ€“10y8=Yesy_8 = \text{Yes}y8โ€‹=Yes
x9x_9x9โ€‹NoYesYesNoFull$YesNoBurger>60y9=Noy_9 = \text{No}y9โ€‹=No
x10x_{10}x10โ€‹YesYesYesYesFull$$$NoYesItalian10โ€“30y10=Noy_{10} = \text{No}y10โ€‹=No
x11x_{11}x11โ€‹NoNoNoNoNone$NoNoThai0โ€“10y11=Noy_{11} = \text{No}y11โ€‹=No
x12x_{12}x12โ€‹YesYesYesYesFull$NoNoBurger30โ€“60y12=Yesy_{12} = \text{Yes}y12โ€‹=Yes

A Decision Tree for Restaurant Waiting

alt text

Learning Decision Trees from Examples

๋ชฉํ‘œ: ํ•™์Šต ๋ฐ์ดํ„ฐ(training data)์™€ ์ผ์น˜(consistent)ํ•˜๋ฉด์„œ ๊ฐ€๋Šฅํ•œ ๊ฐ€์žฅ ์ž‘์€ Decision tree๋ฅผ ์ฐพ๋Š” ๊ฒƒ

  • ๋ณด์žฅ๋œ ๊ฐ€์žฅ ์ž‘์€ ํŠธ๋ฆฌ๋ฅผ ์ฐพ๋Š” ๊ฒƒ์€ ๋‚œํ•ด(intractable)ํ•จ.
  • ๊ฐ„๋‹จํ•œ ํœด๋ฆฌ์Šคํ‹ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์žฅ ์ž‘์€ ํŠธ๋ฆฌ์— ๊ฐ€๊นŒ์šด ๊ฒƒ์„ ํšจ์œจ์ ์œผ๋กœ ์ฐพ์„ ์ˆ˜ ์žˆ์Œ.
  • LEARN-DECISION-TREE ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํƒ์š•์  ๋ถ„ํ•  ์ •๋ณต(greedy divide-and-conquer) ์ „๋žต์„ ์ฑ„ํƒ
  • ํ•ญ์ƒ ๊ฐ€์žฅ ์ค‘์š”ํ•œ(most important) ์†์„ฑ์„ ๋จผ์ € ํ…Œ์ŠคํŠธํ•œ ๋‹ค์Œ, ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ๋กœ ์ •์˜๋˜๋Š” ๋” ์ž‘์€ ํ•˜์œ„ ๋ฌธ์ œ(subproblems)๋ฅผ ์žฌ๊ท€์ ์œผ๋กœ ํ•ด๊ฒฐ
  • "๊ฐ€์žฅ ์ค‘์š”ํ•œ ์†์„ฑ": ์˜ˆ์ œ์˜ ๋ถ„๋ฅ˜์— ๊ฐ€์žฅ ํฐ ์ฐจ์ด๋ฅผ ๋งŒ๋“œ๋Š” ์†์„ฑ

๋ชฉํ‘œ: ์ ์€ ์ˆ˜์˜ ํ…Œ์ŠคํŠธ๋กœ ์˜ฌ๋ฐ”๋ฅธ ๋ถ„๋ฅ˜์— ๋„๋‹ฌ (ํŠธ๋ฆฌ์˜ ๋ชจ๋“  ๊ฒฝ๋กœ๊ฐ€ ์งง๊ณ  ํŠธ๋ฆฌ๊ฐ€ ์–•์Œ)

  • ๊ทธ๋ฆผ (a)๋Š” TypeTypeType์ด ์ข‹์ง€ ์•Š์€ ์†์„ฑ์ž„์„ ๋ณด์—ฌ์คŒ (4๊ฐœ์˜ ๊ฒฐ๊ณผ ๋ชจ๋‘ ์–‘์„ฑ/์Œ์„ฑ ์˜ˆ์ œ๊ฐ€ ๋™์ผ)
  • ๊ทธ๋ฆผ (b)๋Š” PatronsPatronsPatrons๊ฐ€ ์ƒ๋‹นํžˆ ์ค‘์š”ํ•œ ์†์„ฑ์ž„์„ ๋ณด์—ฌ์คŒ (NoneNoneNone ๋˜๋Š” SomeSomeSome์ธ ๊ฒฝ์šฐ, ๋ช…ํ™•ํ•œ ๋‹ต(NoNoNo ๋˜๋Š” YesYesYes)์„ ์–ป์Œ)

alt text

  • ๊ฐ’์ด FullFullFull์ด๋ฉด, ํ˜ผํ•ฉ๋œ ์˜ˆ์ œ ์„ธํŠธ๊ฐ€ ๋‚จ์Œ. ์ด๋Ÿฌํ•œ ์žฌ๊ท€์  ํ•˜์œ„ ๋ฌธ์ œ(recursive subproblems)์— ๋Œ€ํ•œ 4๊ฐ€์ง€ ๊ฒฝ์šฐ
    1. ๋‚จ์€ ์˜ˆ์ œ๊ฐ€ ๋ชจ๋‘ ์–‘์„ฑ (๋˜๋Š” ๋ชจ๋‘ ์Œ์„ฑ)์ธ ๊ฒฝ์šฐ: ์™„๋ฃŒ (YesYesYes ๋˜๋Š” NoNoNo๋กœ ๋‹ตํ•จ). (b)์˜ NoneNoneNone ๋ฐ SomeSomeSome ๊ฐ€์ง€ ์˜ˆ์‹œ
    2. ์ผ๋ถ€ ์–‘์„ฑ ๋ฐ ์ผ๋ถ€ ์Œ์„ฑ ์˜ˆ์ œ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ: ์ด๋“ค์„ ๋ถ„ํ• ํ•  ์ตœ์ƒ์˜ ์†์„ฑ์„ ์„ ํƒํ•จ. (b)์—์„œ ๋‚จ์€ ์˜ˆ์ œ๋ฅผ ๋ถ„ํ• ํ•˜๊ธฐ ์œ„ํ•ด HungryHungryHungry๊ฐ€ ์‚ฌ์šฉ
    3. ๋‚จ์€ ์˜ˆ์ œ๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ: ์ด ์†์„ฑ ๊ฐ’ ์กฐํ•ฉ์— ๋Œ€ํ•ด ๊ด€์ฐฐ๋œ ์˜ˆ์ œ๊ฐ€ ์—†์Œ์„ ์˜๋ฏธ. ๋ถ€๋ชจ ๋…ธ๋“œ(parent) ๊ตฌ์„ฑ์— ์‚ฌ์šฉ๋œ ์˜ˆ์ œ ์„ธํŠธ์—์„œ ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ์ถœ๋ ฅ ๊ฐ’์„ ๋ฐ˜ํ™˜
    4. ์†์„ฑ์€ ๋‚จ์ง€ ์•Š์•˜์ง€๋งŒ ์–‘์„ฑ ๋ฐ ์Œ์„ฑ ์˜ˆ์ œ๊ฐ€ ๋ชจ๋‘ ์žˆ๋Š” ๊ฒฝ์šฐ: ๋ฐ์ดํ„ฐ์˜ ์˜ค๋ฅ˜๋‚˜ ๋…ธ์ด์ฆˆ(noise), ๋น„๊ฒฐ์ •๋ก ์ (nondeterministic) ๋„๋ฉ”์ธ, ๋˜๋Š” ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์žˆ๋Š” ์†์„ฑ์„ ๊ด€์ฐฐํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์ผ ์ˆ˜ ์žˆ์Œ. ๋‚จ์€ ์˜ˆ์ œ ์ค‘ ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ์ถœ๋ ฅ ๊ฐ’์„ ๋ฐ˜ํ™˜

LEARN-DECISION-TREE Algorithm

alt text

  • ์ƒ˜ํ”Œ training set์— ๋Œ€ํ•œ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ถœ๋ ฅ
  • ํŠธ๋ฆฌ๋Š” ๋‚ด๋ถ€ ๋…ธ๋“œ์˜ ํ…Œ์ŠคํŠธ, ๊ฐ€์ง€์˜ ์†์„ฑ ๊ฐ’, ๋ฆฌํ”„ ๋…ธ๋“œ์˜ ์ถœ๋ ฅ ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ
  • ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ RainingRainingRaining๊ณผ ReservationReservationReservation์— ๋Œ€ํ•œ ํ…Œ์ŠคํŠธ๋ฅผ ํฌํ•จํ•˜์ง€ ์•Š์Œ (๋ชจ๋“  ์˜ˆ์ œ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ)

Choosing Attribute Tests

  • Decision tree learning algorithm์€ ๊ฐ€์žฅ ๋†’์€ IMPORTANCE๋ฅผ ๊ฐ€์ง„ ์†์„ฑ์„ ์„ ํƒ
  • Entropy ๊ฐœ๋…์„ ์‚ฌ์šฉํ•˜์—ฌ ์ •์˜๋˜๋Š” information gain์„ ์‚ฌ์šฉํ•˜์—ฌ IMPORTANCE๋ฅผ ์ธก์ •

Entropy ๋ณต์Šต

  • Random variable์˜ ๋ถˆํ™•์‹ค์„ฑ ์ฒ™๋„
  • ๊ฐ’ xxx๊ฐ€ ํ™•๋ฅ  P(x)P(x)P(x)๋ฅผ ๊ฐ–๋Š” ํ™•๋ฅ ๋ณ€์ˆ˜ XXX์˜ ์—”ํŠธ๋กœํ”ผ
    • H(X)=โˆ’โˆ‘xP(x)logโก2P(x)H(X) = -\sum_{x} P(x) \log_2 P(x)H(X)=โˆ’โˆ‘xโ€‹P(x)log2โ€‹P(x)
  • ๊ณต์ •ํ•œ ๋™์ „ ๋˜์ง€๊ธฐ ์—”ํŠธ๋กœํ”ผ = 1 bit
  • ํ™•๋ฅ  qqq๋กœ true์ธ boolean ํ™•๋ฅ ๋ณ€์ˆ˜์˜ ์—”ํŠธ๋กœํ”ผ B(q)B(q)B(q): B(q)=โˆ’(qlogโก2q+(1โˆ’q)logโก2(1โˆ’q))B(q) = -(q \log_2 q + (1-q) \log_2(1-q))B(q)=โˆ’(qlog2โ€‹q+(1โˆ’q)log2โ€‹(1โˆ’q))
  • ppp๊ฐœ์˜ positive example๊ณผ nnn๊ฐœ์˜ negative example์„ ํฌํ•จํ•˜๋Š” training set์—์„œ, ์ „์ฒด ์„ธํŠธ์— ๋Œ€ํ•œ ์ถœ๋ ฅ ๋ณ€์ˆ˜์˜ ์—”ํŠธ๋กœํ”ผ๋Š” H(Output)=B(pp+n)H(Output) = B(\frac{p}{p+n})H(Output)=B(p+npโ€‹)
  • ์‹๋‹น training set: p=n=6p = n = 6p=n=6์ด๋ฏ€๋กœ, ํ•ด๋‹น ์—”ํŠธ๋กœํ”ผ๋Š” B(0.5)B(0.5)B(0.5) ์ฆ‰ 1 bit
  • ์†์„ฑ AAA์— ๋Œ€ํ•œ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ๋Š” ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์ „์ฒด ์—”ํŠธ๋กœํ”ผ๋ฅผ ๊ฐ์†Œ์‹œํ‚ด
  • ddd๊ฐœ์˜ ๊ณ ์œ ํ•œ ๊ฐ’์„ ๊ฐ€์ง„ ์†์„ฑ AAA๋Š” training set EEE๋ฅผ E1,ย โ€ฆ,ย EdE_1,~\dots,~E_dE1โ€‹,ย โ€ฆ,ย Edโ€‹ ๋ถ€๋ถ„์ง‘ํ•ฉ(subsets)์œผ๋กœ ๋‚˜๋ˆ”
  • ๊ฐ ๋ถ€๋ถ„์ง‘ํ•ฉ EkE_kEkโ€‹๋Š” pkp_kpkโ€‹๊ฐœ์˜ positive example๊ณผ nkn_knkโ€‹๊ฐœ์˜ negative example์„ ๊ฐ€์ง€๋ฉฐ, ํ•ด๋‹น ๊ฐ€์ง€๋ฅผ ๋”ฐ๋ฅด๋ฉด B(pkpk+nk)B(\frac{p_k}{p_k+n_k})B(pkโ€‹+nkโ€‹pkโ€‹โ€‹) ๋น„ํŠธ์˜ ์ถ”๊ฐ€ ์ •๋ณด๊ฐ€ ํ•„์š”
  • ์†์„ฑ AAA๋ฅผ ํ…Œ์ŠคํŠธํ•œ ํ›„ ๋‚จ์€ expected entropy
  • Remainder(A)=โˆ‘k=1dpk+nkp+nB(pkpk+nk)Remainder(A) = \sum_{k=1}^{d} \frac{p_k + n_k}{p + n} B(\frac{p_k}{p_k+n_k})Remainder(A)=โˆ‘k=1dโ€‹p+npkโ€‹+nkโ€‹โ€‹B(pkโ€‹+nkโ€‹pkโ€‹โ€‹)

์†์„ฑ AAA ํ…Œ์ŠคํŠธ๋กœ ์ธํ•œ information gain

  • Gain(A)=B(pp+n)โˆ’Remainder(A)Gain(A) = B(\frac{p}{p+n}) - Remainder(A)Gain(A)=B(p+npโ€‹)โˆ’Remainder(A)
  • Gain(A)Gain(A)Gain(A)๊ฐ€ IMPORTANCE ํ•จ์ˆ˜๋ฅผ ๊ตฌํ˜„ํ•จ
  • Gain(Patrons)=1โˆ’[212B(02)+412B(44)+612B(26)]โ‰ˆ0.541Gain(Patrons) = 1 - [\frac{2}{12} B(\frac{0}{2}) + \frac{4}{12} B(\frac{4}{4}) + \frac{6}{12} B(\frac{2}{6})] \approx 0.541Gain(Patrons)=1โˆ’[122โ€‹B(20โ€‹)+124โ€‹B(44โ€‹)+126โ€‹B(62โ€‹)]โ‰ˆ0.541 bits
  • Gain(Type)=1โˆ’[212B(12)+212B(12)+412B(24)+412B(24)]=0Gain(Type) = 1 - [\frac{2}{12} B(\frac{1}{2}) + \frac{2}{12} B(\frac{1}{2}) + \frac{4}{12} B(\frac{2}{4}) + \frac{4}{12} B(\frac{2}{4})] = 0Gain(Type)=1โˆ’[122โ€‹B(21โ€‹)+122โ€‹B(21โ€‹)+124โ€‹B(42โ€‹)+124โ€‹B(42โ€‹)]=0 bits
  • PatronsPatronsPatrons๊ฐ€ ๋ถ„ํ• ํ•˜๊ธฐ ๋” ์ข‹์€ ์†์„ฑ์ด๋ผ๋Š” ์ง๊ด€์„ ํ™•์ธ์‹œ์ผœ ์คŒ
  • PatronsPatronsPatrons๋Š” ๋ชจ๋“  ์†์„ฑ ์ค‘ ์ตœ๋Œ€ information gain์„ ๊ฐ€์ง€๋ฏ€๋กœ decision tree learning ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์˜ํ•ด root๋กœ ์„ ํƒ๋จ.
์ตœ๊ทผ ์ˆ˜์ •: 25. 11. 6. ์˜คํ›„ 12:07
Contributors: kmbzn
Prev
13. Linear Models
Next
15. Other Classic ML Models (2)

BUILT WITH

CloudflareNode.jsGitHubGitVue.jsJavaScriptVSCodenpm

All trademarks and logos are property of their respective owners.
ยฉ 2025 kmbzn ยท MIT License