• Mindscape ๐Ÿ”ฅ
    • Playlist ๐ŸŽง
  • ๐Ÿค– Artifical Intelligence

    • 1. Basics; Linear Algebra
    • 2. Basics; Linear Algebra (2), Search (1)
    • 3. Search (2)
    • 4. Knowledge and Logic (1)
    • 5. Knowledge and Logic (2)
    • 6. Probability
    • 7. Information Theory
    • 8. Probabilitc Reasoning (2)
    • 9. Probabilitc Reasoning (3)
    • 10. Machine Learning (1)
    • 11. Machine Learning (2)
    • 12. Machine Learning (3)
    • 13. Linear Models
    • 14. Other Classic ML Models (1)
    • 15. Other Classic ML Models (2)
  • ๐Ÿ”’ Computer Security

    • 01. Overview
    • 02. ์ •๋ณด๋ณด์•ˆ์ •์ฑ… ๋ฐ ๋ฒ•๊ทœ
    • 03. Cryptographic Tools
    • 04. User Authentication
    • 05. Access Control
    • 06. Database Security
    • 07. Malicious Software
    • 08. Firmware Analysis
  • ๐Ÿ—„๏ธ Database System

    • 1. Introduction
    • 2. Relational Model
    • 3. SQL
    • 6. E-R Model
    • 7. Relational Database Design (1)
    • 7. Relational Database Design (2)
    • 13. Data Storage Structures
    • 14. Indexing
    • 15. Query Processing
  • ๐Ÿ“ Software Engineering

    • 2. Introduction to Software Engineering
    • 3. Process
    • 4. Process Models
    • 5. Agile
    • 6. Requirements
    • 7. Requirements Elicitation and Documentation
    • 8. Architecture
    • 9. Unified Modelling Language
    • 10. Object-Oriented Analysis
    • Object-Oriented Design
  • ๐Ÿง  Algorithm

    • Python ์‹œ๊ฐ„ ์ดˆ๊ณผ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ํŒ
    • C++ std::vector ์‚ฌ์šฉ๋ฒ• ์ •๋ฆฌ
    • Vim ์‚ฌ์šฉ ๋งค๋‰ด์–ผ
    • 1018๋ฒˆ: ์ฒด์ŠคํŒ ๋‹ค์‹œ ์น ํ•˜๊ธฐ
    • 1966๋ฒˆ: ํ”„๋ฆฐํ„ฐ ํ

11. Machine Learning (2)

More Details on (Supervised) Machine Learning

  • ๋ถ„์•ผ ํŒŒ์•…์— ํ•„์š”ํ•œ ํ•ต์‹ฌ concepts

Supervised Learning

  • Supervised learning์˜ ๊ณต์‹์ ์ธ ์ž‘์—…
  • NNN๊ฐœ์˜ ์ž…์ถœ๋ ฅ ์Œ (x1,ย y1),ย (x2,ย y2),ย โ€ฆ,ย (xN,ย yN)(x_1,~ y_1),~ (x_2,~ y_2),~ \dots,~ (x_N,~ y_N)(x1โ€‹,ย y1โ€‹),ย (x2โ€‹,ย y2โ€‹),ย โ€ฆ,ย (xNโ€‹,ย yNโ€‹)์˜ training set์ด ์ฃผ์–ด์ง
  • ๊ฐ ์Œ์€ ์•Œ ์ˆ˜ ์—†๋Š” ํ•จ์ˆ˜ y=f(x)y = f(x)y=f(x)์— ์˜ํ•ด ์ƒ์„ฑ
  • ์‹ค์ œ ํ•จ์ˆ˜ fff๋ฅผ ๊ทผ์‚ฌํ•˜๋Š” ํ•จ์ˆ˜ hhh์˜ ๋ฐœ๊ฒฌ
  • ํ•จ์ˆ˜ hhh: Hypothesis
  • ๊ฐ€๋Šฅํ•œ ํ•จ์ˆ˜ ์ง‘ํ•ฉ์ธ hypothesis space H\mathcal{H}H์—์„œ ๋„์ถœ
  • ์˜ˆ: Hypothesis space๋Š” 3์ฐจ ๋‹คํ•ญ์‹(polynomials)์˜ ์ง‘ํ•ฉ์ผ ์ˆ˜ ์žˆ์Œ
  • hhh๋Š” model class H\mathcal{H}H์—์„œ ๋„์ถœ๋œ model์ด๊ฑฐ๋‚˜ function class์—์„œ ๋„์ถœ๋œ ํ•จ์ˆ˜
  • ์ถœ๋ ฅ yiy_iyiโ€‹: Ground truth (๋˜๋Š” gold-standard)
  • Model์ด ์˜ˆ์ธกํ•˜๋„๋ก ์š”์ฒญ๋ฐ›๋Š” ์‹ค์ œ ์ •๋‹ต

How Do We Choose a Hypothesis Space?

  • ๋ฐ์ดํ„ฐ ์ƒ์„ฑ process์— ๋Œ€ํ•œ ์‚ฌ์ „ ์ง€์‹ ํ™œ์šฉ
  • ์‚ฌ์ „ ์ง€์‹์ด ์—†๋Š” ๊ฒฝ์šฐ: Exploratory Data Analysis (EDA) ์ˆ˜ํ–‰
  • ํ†ต๊ณ„์  test ๋ฐ ์‹œ๊ฐํ™”(histograms, scatter plots, box plots ๋“ฑ)๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ํŒŒ์•… ๋ฐ ์ ์ ˆํ•œ hypothesis space์— ๋Œ€ํ•œ insight ์–ป๊ธฐ
  • ๋˜๋Š” ์—ฌ๋Ÿฌ hypothesis spaces๋ฅผ ์‹œ๋„ํ•˜๊ณ  ๊ฐ€์žฅ ์ž˜ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ์„ ํ‰๊ฐ€

alt text

How Do We Choose a Good Hypothesis from the Hypothesis Space?

  • ์ผ๊ด€๋œ(consistent) hypothesis ๊ธฐ๋Œ€: Training set์˜ ๋ชจ๋“  xix_ixiโ€‹์— ๋Œ€ํ•ด h(xi)=yih(x_i) = y_ih(xiโ€‹)=yiโ€‹
  • Continuous-valued ์ถœ๋ ฅ: Ground truth์™€ ์ •ํ™•ํžˆ ์ผ์น˜ํ•˜๊ธฐ ์–ด๋ ค์›€. ๋Œ€์‹  h(xi)h(x_i)h(xiโ€‹)๊ฐ€ yiy_iyiโ€‹์— ๊ฐ€๊นŒ์šด best-fit ํ•จ์ˆ˜ ํƒ์ƒ‰
  • Hypothesis์˜ ์ง„์ •ํ•œ ์ฒ™๋„: Training set์ด ์•„๋‹Œ, ์•„์ง ๋ณด์ง€ ๋ชปํ•œ ์ž…๋ ฅ(test set)์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋Šฅ๋ ฅ
  • Test set: (xj,ย yj)(x_j,~ y_j)(xjโ€‹,ย yjโ€‹) ์Œ์˜ ๋‘ ๋ฒˆ์งธ sample๋กœ ํ‰๊ฐ€
  • hhh๊ฐ€ test set์˜ ์ถœ๋ ฅ์„ ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๋ฉด '์ผ๋ฐ˜ํ™”(generalizes well)'๊ฐ€ ์ž˜ ๋˜์—ˆ๋‹ค๊ณ  ํ•จ

Bias and Variance

  • Hypothesis space ๋ถ„์„ ๋ฐฉ๋ฒ•: Bias (๋ฐ์ดํ„ฐ์…‹๊ณผ ๋ฌด๊ด€)์™€ variance (training set ๊ฐ„์˜ ์ฐจ์ด)
  • Bias: ๋‹ค๋ฅธ training sets์— ๋Œ€ํ•ด ํ‰๊ท ์„ ๋ƒˆ์„ ๋•Œ, ์˜ˆ์ธก hypothesis๊ฐ€ ๊ธฐ๋Œ€๊ฐ’์—์„œ ๋ฒ—์–ด๋‚˜๋Š” ๊ฒฝํ–ฅ
  • Bias๋Š” ์ข…์ข… hypothesis space์— ์˜ํ•ด ๋ถ€๊ณผ๋œ ์ œ์•ฝ์œผ๋กœ ์ธํ•ด ๋ฐœ์ƒ
  • ์˜ˆ: ์„ ํ˜• ํ•จ์ˆ˜(linear functions)์˜ hypothesis space๋Š” ๊ฐ•ํ•œ bias๋ฅผ ์œ ๋„ (์ง์„ ์œผ๋กœ๋งŒ ๊ตฌ์„ฑ)
  • Hypothesis๊ฐ€ ๋ฐ์ดํ„ฐ์—์„œ pattern์„ ์ฐพ์ง€ ๋ชปํ•  ๋•Œ: Underfitting
  • Variance: Training data์˜ ๋ณ€๋™(fluctuation)์œผ๋กœ ์ธํ•œ hypothesis์˜ ๋ณ€ํ™”๋Ÿ‰
  • Low variance: ๋ฐ์ดํ„ฐ์…‹์˜ ์ž‘์€ ์ฐจ์ด๊ฐ€ hypothesis์˜ ์ž‘์€ ์ฐจ์ด๋กœ ์ด์–ด์ง (์ฒ˜์Œ 3๊ฐœ ์—ด)
  • High variance: 12์ฐจ ๋‹คํ•ญ์‹(degree-12 polynomials)์€ ๋†’์€ variance๋ฅผ ๊ฐ€์ง (x์ถ• ์–‘ ๋์—์„œ ๋‘ ํ•จ์ˆ˜๊ฐ€ ๋งค์šฐ ๋‹ค๋ฆ„)
  • ํ•จ์ˆ˜๊ฐ€ ํ›ˆ๋ จ๋œ ํŠน์ • data set์— ๋„ˆ๋ฌด ์ง‘์ค‘ํ•˜์—ฌ, ๋ณด์ง€ ๋ชปํ•œ ๋ฐ์ดํ„ฐ(unseen data)์—์„œ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ๋•Œ: Overfitting

Bias-Variance Tradeoff

  • ์ข…์ข… bias-variance tradeoff ๋ฐœ์ƒ
  • ๋ณต์žกํ•˜๊ณ  low-bias์ธ (training data์— ์ž˜ ๋งž๋Š”) hypothesis์™€, ๋‹จ์ˆœํ•˜๊ณ  low-variance์ธ (์ผ๋ฐ˜ํ™”๊ฐ€ ๋” ์ž˜ ๋  ์ˆ˜ ์žˆ๋Š”) hypothesis ๊ฐ„์˜ ์„ ํƒ
  • Ockham's razor (์˜ค์ปด์˜ ๋ฉด๋„๋‚ )
  • 14์„ธ๊ธฐ ์˜๊ตญ ์ฒ ํ•™์ž William of Ockham
  • "ํ•„์š” ์—†์ด (์‹ค์ฒด์˜) ๋ณต์ˆ˜์„ฑ(plurality)์„ ๊ฐ€์ •ํ•ด์„œ๋Š” ์•ˆ ๋œ๋‹ค"๋Š” ์›์น™. ์˜์‹ฌ์Šค๋Ÿฌ์šด ์„ค๋ช…์„ "๊นŽ์•„๋‚ด๋Š”(shave off)" ๋ฐ ์‚ฌ์šฉ๋จ

alt text

How to Find the Best Hypothesis

  • Supervised learning: Data๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ hypothesis hโˆ—h^*hโˆ— ์„ ํƒ
  • hโˆ—=argmaxhโˆˆHP(hโˆฃdata)h^* = \text{argmax}_{h \in \mathcal{H}} P(h | \text{data})hโˆ—=argmaxhโˆˆHโ€‹P(hโˆฃdata)
  • Bayes' rule์— ๋”ฐ๋ผ:

    hโˆ—=argmaxhโˆˆHP(dataโˆฃh)P(h)h^* = \text{argmax}_{h \in \mathcal{H}} P(\text{data} | h) P(h) hโˆ—=argmaxhโˆˆHโ€‹P(dataโˆฃh)P(h)

  • P(h)P(h)P(h) (์‚ฌ์ „ ํ™•๋ฅ , prior probability): ๋ถ€๋“œ๋Ÿฌ์šด 1์ฐจ ๋˜๋Š” 2์ฐจ ๋‹คํ•ญ์‹์—์„œ ๋†’๊ณ , ํฌ๊ณ  ๋พฐ์กฑํ•œ 12์ฐจ ๋‹คํ•ญ์‹์—์„œ ๋‚ฎ์Œ
  • Data๊ฐ€ ์ •๋ง ํ•„์š”ํ•˜๋‹ค๊ณ  ํ•  ๋•Œ๋งŒ ํŠน์ดํ•œ ํ•จ์ˆ˜๋ฅผ ํ—ˆ์šฉํ•˜๊ณ , ๋‚ฎ์€ ์‚ฌ์ „ ํ™•๋ฅ ์„ ๋ถ€์—ฌํ•˜์—ฌ ์–ต์ œ
  • Q: H\mathcal{H}H๋ฅผ ๋ชจ๋“  ์ปดํ“จํ„ฐ program ๋˜๋Š” ๋ชจ๋“  Turing machine์˜ class๋กœ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ์ด์œ ?
  • A: Hypothesis space์˜ ํ‘œํ˜„๋ ฅ(expressiveness)๊ณผ ๊ทธ ๊ณต๊ฐ„ ๋‚ด์—์„œ ์ข‹์€ hypothesis๋ฅผ ์ฐพ๋Š” ๊ณ„์‚ฐ ๋ณต์žก์„ฑ(computational complexity) ๊ฐ„์˜ tradeoff ๋•Œ๋ฌธ

Model Selection and Optimization

I.I.D. Assumption

  • Machine learning์˜ ๋ชฉํ‘œ: ๋ฏธ๋ž˜์˜ ์˜ˆ์‹œ(future examples)์— ์ตœ์ ์œผ๋กœ ๋งž๋Š” hypothesis ์„ ํƒ
  • "๋ฏธ๋ž˜์˜ ์˜ˆ์‹œ"์™€ "์ตœ์ ์˜ ์ ํ•ฉ(optimal fit)"์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€?
  • ๊ฐ€์ • 1: ๋ฏธ๋ž˜์˜ ์˜ˆ์‹œ๊ฐ€ ๊ณผ๊ฑฐ์™€ ๊ฐ™์„ ๊ฒƒ (Stationarity assumption)
  • ๊ฐ ์˜ˆ์‹œ EjE_jEjโ€‹๋Š” ๋™์ผํ•œ ์‚ฌ์ „ ํ™•๋ฅ  ๋ถ„ํฌ(prior probability distribution)๋ฅผ ๊ฐ€์ง:

    P(Ej)=P(Ej+1)=P(Ej+2)=โ€ฆP(E_j) = P(E_{j+1}) = P(E_{j+2}) = \dots P(Ejโ€‹)=P(Ej+1โ€‹)=P(Ej+2โ€‹)=โ€ฆ

  • ๊ฐ ์˜ˆ์‹œ๋Š” ์ด์ „ ์˜ˆ์‹œ์™€ ๋…๋ฆฝ์ (independent):

    P(Ej)=P(EjโˆฃEjโˆ’1,ย Ejโˆ’2,ย โ€ฆโ€‰)P(E_j) = P(E_j | E_{j-1},~ E_{j-2},~ \dots) P(Ejโ€‹)=P(Ejโ€‹โˆฃEjโˆ’1โ€‹,ย Ejโˆ’2โ€‹,ย โ€ฆ)

  • ์ด ๋‘ ๋ฐฉ์ •์‹์„ ๋งŒ์กฑํ•˜๋Š” ์˜ˆ์‹œ: Independent and identically distributed (๋…๋ฆฝ ํ•ญ๋“ฑ ๋ถ„ํฌ) ๋˜๋Š” i.i.d.

Error Rate and Two Different Datasets

  • Optimal fit ์ •์˜: Error rate (์˜ค์ฐจ์œจ)๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” hypothesis
  • Error rate: (x,ย y)(x,~ y)(x,ย y) ์˜ˆ์‹œ์— ๋Œ€ํ•ด h(x)โ‰ yh(x) \neq yh(x)๎€ =y์ธ ๋น„์œจ
  • (์ถ”ํ›„: "๊ฑฐ์˜" ์ •๋‹ต์ธ ๋‹ต๋ณ€์— ๋ถ€๋ถ„ ์ ์ˆ˜(partial credit) ํ—ˆ์šฉ)
  • Test set์„ ํ†ตํ•ด hypothesis์˜ error rate ์ถ”์ •
  • Hypothesis๊ฐ€ test ์ „์— test ์ •๋‹ต์„ ๋ณด๋Š” ๊ฒƒ์€ cheating (๋ถ€์ •ํ–‰์œ„)
  • ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ์ง€๋ฒ•: ์˜ˆ์‹œ๋ฅผ ๋‘ set์œผ๋กœ ๋ถ„๋ฆฌ
  • Training set: Hypothesis ์ƒ์„ฑ์šฉ
  • Test set: Hypothesis ํ‰๊ฐ€์šฉ

Hyperparameters and Three Datasets

  • ๋‹จ์ผ hypothesis ์ƒ์„ฑ ์‹œ: Training/test set ๋ถ„๋ฆฌ๋กœ ์ถฉ๋ถ„
  • ์—ฌ๋Ÿฌ hypothesis ์ƒ์„ฑ ์‹œ (์˜ˆ: ๋‘ ML model ๋น„๊ต, model ๋‚ด ์„ค์ • ์กฐ์ •): ๋ถˆ์ถฉ๋ถ„
  • Hyperparameters: ๊ฐœ๋ณ„ model์ด ์•„๋‹Œ model class์˜ parameters
  • ์—ฐ๊ตฌ์ž๊ฐ€ KNN classifier์˜ hyperparameter๋ฅผ ๋ฐ”๊ฟ”๊ฐ€๋ฉฐ test set error rate๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒฝ์šฐ:
  • ๊ฐœ๋ณ„ hypothesis๋Š” test set์„ ๋ณด์ง€ ์•Š์•˜์ง€๋งŒ, ์—ฐ๊ตฌ์ž๋ฅผ ํ†ตํ•ด ์ „์ฒด process๊ฐ€ test set์„ "์—ฟ๋ด„"
  • ํ•ด๊ฒฐ์ฑ…: Training, experimenting, hyperparameter-tuning, re-training ๋“ฑ์ด ๋ชจ๋‘ ๋๋‚  ๋•Œ๊นŒ์ง€ test set์„ ์™„์ „ํžˆ ๋ถ„๋ฆฌ(hold out)

Three Datasets and Cross-Validation

  • 3๊ฐœ์˜ data set ํ•„์š”
    • Training set: ํ›„๋ณด models ํ›ˆ๋ จ
    • Validation set (development set ๋˜๋Š” dev set): ํ›„๋ณด models ํ‰๊ฐ€ ๋ฐ ์ตœ์„ ์ฑ… ์„ ํƒ
    • Test set: ์ตœ์ข… model์˜ ํŽธํ–ฅ ์—†๋Š”(unbiased) ์ตœ์ข… ํ‰๊ฐ€
  • K-fold cross-validation
    • 3๊ฐœ set์„ ๋งŒ๋“ค๊ธฐ์— data๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์„ ๋•Œ ์‚ฌ์šฉ
    • Data๋ฅผ kkk๊ฐœ์˜ ๋™์ผํ•œ subset์œผ๋กœ ๋ถ„๋ฆฌ
    • kkk ๋ผ์šด๋“œ์˜ ํ•™์Šต ์ˆ˜ํ–‰: ๋งค ๋ผ์šด๋“œ๋งˆ๋‹ค 1/k1/k1/k์€ validation set์œผ๋กœ, ๋‚˜๋จธ์ง€๋Š” training set์œผ๋กœ ์‚ฌ์šฉ
    • ๋งŽ์ด ์“ฐ์ด๋Š” kkk ๊ฐ’: 5, 10
    • (Cross-validation ์‚ฌ์šฉ ์‹œ์—๋„) ์—ฌ์ „ํžˆ ๋ณ„๋„์˜ test set ํ•„์š” alt text

Model Selection and Optimization

  • ์˜ˆ์‹œ: ์„ ํ˜• ํ•จ์ˆ˜(linear function)๋Š” data set์— underfit, ๊ณ ์ฐจ ๋‹คํ•ญ์‹(high-degree polynomial)์€ overfit
  • ์ข‹์€ hypothesis๋ฅผ ์ฐพ๋Š” ์ž‘์—…์€ 2๊ฐœ์˜ ํ•˜์œ„ ์ž‘์—…์œผ๋กœ ๊ฐ„์ฃผ ๊ฐ€๋Šฅ
  • Model (class) selection (hypothesis space ์„ ํƒ): ์ข‹์€ hypothesis space ์„ ํƒ
  • Optimization (๋˜๋Š” training): ํ•ด๋‹น space ๋‚ด์—์„œ ์ตœ์ƒ์˜ hypothesis ํƒ์ƒ‰

Model Selection

  • Model selection์˜ ์ผ๋ถ€๋Š” ์งˆ์ (qualitative)์ด๊ณ  ์ฃผ๊ด€์ (subjective)
  • ์˜ˆ: ๋ฌธ์ œ์— ๋Œ€ํ•œ ์ง€์‹์„ ๋ฐ”ํƒ•์œผ๋กœ decision trees ๋Œ€์‹  polynomials ์„ ํƒ
  • Model selection์˜ ์ผ๋ถ€๋Š” ์–‘์ (quantitative)์ด๊ณ  ๊ฒฝํ—˜์ (empirical)
  • ์˜ˆ: Polynomials class ๋‚ด์—์„œ, validation data set ์„ฑ๋Šฅ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ Degree = 2 ์„ ํƒ

An Algorithm for Model Selection

v

Two Different Patterns That Occur in Model Selection

alt text

  • Model complexity๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ training set error๋Š” ๋‹จ์กฐ๋กญ๊ฒŒ(monotonically) ๊ฐ์†Œ
  • ๋งŽ์€ model class์—์„œ complexity๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด training set error๊ฐ€ 0์— ๋„๋‹ฌ
  • (a) U์žํ˜• validation-error ๊ณก์„ 
  • (b) Validation error๊ฐ€ (์•ฝ๊ฐ„์˜ ๋ณ€๋™๊ณผ ํ•จ๊ป˜) ์ง€์†์ ์œผ๋กœ ๊ฐ์†Œ ์‹œ์ž‘
  • ์ด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ model class๊ฐ€ ์ดˆ๊ณผ ์šฉ๋Ÿ‰(excess capacity)์„ ์–ด๋–ป๊ฒŒ ํ™œ์šฉํ•˜๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ๊ทธ๊ฒƒ์ด ๋‹น๋ฉดํ•œ ๋ฌธ์ œ์™€ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ผ์น˜ํ•˜๋Š”์ง€์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง
์ตœ๊ทผ ์ˆ˜์ •: 25. 11. 6. ์˜คํ›„ 12:07
Contributors: kmbzn
Prev
10. Machine Learning (1)
Next
12. Machine Learning (3)

BUILT WITH

CloudflareNode.jsGitHubGitVue.jsJavaScriptVSCodenpm

All trademarks and logos are property of their respective owners.
ยฉ 2025 kmbzn ยท MIT License