• Mindscape ๐Ÿ”ฅ
    • Playlist ๐ŸŽง
  • ๐Ÿค– Artifical Intelligence

    • 1. Basics; Linear Algebra
    • 2. Basics; Linear Algebra (2), Search (1)
    • 3. Search (2)
    • 4. Knowledge and Logic (1)
    • 5. Knowledge and Logic (2)
    • 6. Probability
    • 7. Information Theory
    • 8. Probabilitc Reasoning (2)
    • 9. Probabilitc Reasoning (3)
    • 10. Machine Learning (1)
    • 11. Machine Learning (2)
    • 12. Machine Learning (3)
    • 13. Linear Models
    • 14. Other Classic ML Models (1)
    • 15. Other Classic ML Models (2)
  • ๐Ÿ”’ Computer Security

    • 01. Overview
    • 02. ์ •๋ณด๋ณด์•ˆ์ •์ฑ… ๋ฐ ๋ฒ•๊ทœ
    • 03. Cryptographic Tools
    • 04. User Authentication
    • 05. Access Control
    • 06. Database Security
    • 07. Malicious Software
    • 08. Firmware Analysis
  • ๐Ÿ—„๏ธ Database System

    • 1. Introduction
    • 2. Relational Model
    • 3. SQL
    • 6. E-R Model
    • 7. Relational Database Design (1)
    • 7. Relational Database Design (2)
    • 13. Data Storage Structures
    • 14. Indexing
    • 15. Query Processing
  • ๐Ÿ“ Software Engineering

    • 2. Introduction to Software Engineering
    • 3. Process
    • 4. Process Models
    • 5. Agile
    • 6. Requirements
    • 7. Requirements Elicitation and Documentation
    • 8. Architecture
    • 9. Unified Modelling Language
    • 10. Object-Oriented Analysis
    • Object-Oriented Design
  • ๐Ÿง  Algorithm

    • Python ์‹œ๊ฐ„ ์ดˆ๊ณผ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ํŒ
    • C++ std::vector ์‚ฌ์šฉ๋ฒ• ์ •๋ฆฌ
    • Vim ์‚ฌ์šฉ ๋งค๋‰ด์–ผ
    • 1018๋ฒˆ: ์ฒด์ŠคํŒ ๋‹ค์‹œ ์น ํ•˜๊ธฐ
    • 1966๋ฒˆ: ํ”„๋ฆฐํ„ฐ ํ

15. Query Processing

Basic Steps in Query Processing

  1. Parsing and translation
    • Parser๊ฐ€ query (SQL) ๊ตฌ๋ฌธ ํ™•์ธ, relation ๊ฒ€์ฆ
    • Query๋ฅผ ๊ด€๊ณ„ ๋Œ€์ˆ˜(relational algebra) ํ‘œํ˜„์‹์œผ๋กœ ๋ณ€ํ™˜
    • ๋น„์ ˆ์ฐจ์  query โ†’ ์ ˆ์ฐจ์  query
  2. Optimization: ์ตœ์ ์˜ ํ‰๊ฐ€ ๊ณ„ํš(evaluation plan) ์ƒ์„ฑ
  3. Evaluation
    • Query-execution engine์ด query-evaluation plan์„ ๋ฐ›์•„ ์‹คํ–‰ํ•˜๊ณ  query์— ๋Œ€ํ•œ ๋‹ต๋ณ€ ๋ฐ˜ํ™˜

Basic Steps in Query Processing: Optimization

  • ํ•˜๋‚˜์˜ relational algebra ํ‘œํ˜„์‹์€ ์—ฌ๋Ÿฌ ๋™๋“ฑํ•œ ํ‘œํ˜„์‹์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Œ
  • ์˜ˆ: ฯƒsalary<75000(ฮ salary(instructor))\sigma_{\text{salary} \lt 75000}(\Pi_{\text{salary}}(\text{instructor}))ฯƒsalary<75000โ€‹(ฮ salaryโ€‹(instructor))๋Š” ฮ salary(ฯƒsalary<75000(instructor))\Pi_{\text{salary}}(\sigma_{\text{salary} \lt 75000}(\text{instructor}))ฮ salaryโ€‹(ฯƒsalary<75000โ€‹(instructor))์™€ ๋™์ผ
  • ๊ฐ relational algebra ์—ฐ์‚ฐ์€ ์—ฌ๋Ÿฌ ๋‹ค๋ฅธ algorithm ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•ด ํ‰๊ฐ€ ๊ฐ€๋Šฅ
  • ๋”ฐ๋ผ์„œ relational algebra ํ‘œํ˜„์‹์€ ์—ฌ๋Ÿฌ ๋ฐฉ์‹์œผ๋กœ ํ‰๊ฐ€๋  ์ˆ˜ ์žˆ์Œ
  • ์˜ˆ) ฯƒsalary>75000(instructor)\sigma_{\text{salary} > 75000}(\text{instructor})ฯƒsalary>75000โ€‹(instructor)
    • table scan ๋˜๋Š”
    • salary์— ๋Œ€ํ•œ index scan (๋งŒ์•ฝ salary์— B+B^+B+-tree index๊ฐ€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด)
  • ์ƒ์„ธํ•œ ํ‰๊ฐ€ ์ „๋žต์„ ๋ช…์‹œํ•˜๋Š” ์ฃผ์„์ด ๋‹ฌ๋ฆฐ ํ‘œํ˜„์‹์„ evaluation primitive๋ผ ํ•จ
  • ์˜ˆ: ฯƒsalary>75000(instructor)\sigma_{\text{salary} > 75000}(\text{instructor})ฯƒsalary>75000โ€‹(instructor): salary์— index ์‚ฌ์šฉ
  • Query-execution plan (๋˜๋Š” query-evaluation plan)
    • Query๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ primitive ์—ฐ์‚ฐ๋“ค์˜ ์‹œํ€€์Šค
  • Query Optimization
    • ๋ชจ๋“  ๋™๋“ฑํ•œ evaluation plan ์ค‘์—์„œ ๊ฐ€์žฅ ๋น„์šฉ์ด ๋‚ฎ์€ ๊ฒƒ์„ ์„ ํƒ
    • ๋น„์šฉ์€ database catalog์˜ ํ†ต๊ณ„ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•ด ์ถ”์ •
    • ์˜ˆ: ๊ฐ relation์˜ tuple ์ˆ˜, tuple์˜ ํฌ๊ธฐ ๋“ฑ
  • ์ด ์žฅ์—์„œ ๋‹ค๋ฃจ๋Š” ๋‚ด์šฉ
    • Query ๋น„์šฉ ์ธก์ • ๋ฐฉ๋ฒ•
    • Relational algebra ์—ฐ์‚ฐ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ algorithm
    • ์™„์ „ํ•œ ํ‘œํ˜„์‹์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ๋ณ„ ์—ฐ์‚ฐ algorithm์„ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•
  • 16์žฅ ๊ด€๋ จ
    • Query ์ตœ์ ํ™” ๋ฐฉ๋ฒ•, ์ฆ‰ ๊ฐ€์žฅ ๋‚ฎ์€ ์ถ”์ • ๋น„์šฉ์„ ๊ฐ€์ง„ evaluation plan์„ ์ฐพ๋Š” ๋ฐฉ๋ฒ• ์—ฐ๊ตฌ

Measures of Query Cost

  • ์‹œ๊ฐ„ ๋น„์šฉ์— ๊ธฐ์—ฌํ•˜๋Š” ๋งŽ์€ ์š”์ธ
    • Disk access, CPU, network communication
  • ๋น„์šฉ ์ธก์ • ๊ธฐ์ค€
    • ์‘๋‹ต ์‹œ๊ฐ„(response time), ์ฆ‰ query ์‘๋‹ต์— ๊ฑธ๋ฆฐ ์ด ๊ฒฝ๊ณผ ์‹œ๊ฐ„
    • ์ด ์ž์› ์†Œ๋น„(total resource consumption)
  • ์ด ์ž์› ์†Œ๋น„๋ฅผ ๋น„์šฉ metric์œผ๋กœ ์‚ฌ์šฉ
  • ์‘๋‹ต ์‹œ๊ฐ„์€ ์ถ”์ •ํ•˜๊ธฐ ๋” ์–ด๋ ต๊ณ , ๊ณต์œ  database์—์„œ๋Š” ์ž์› ์†Œ๋น„ ์ตœ์†Œํ™”๊ฐ€ ์ข‹์Œ
  • ๋‹จ์ˆœํ™”๋ฅผ ์œ„ํ•ด CPU ๋น„์šฉ์€ ๋ฌด์‹œ
  • ์‹ค์ œ ์‹œ์Šคํ…œ์€ CPU ๋น„์šฉ์„ ๊ณ ๋ คํ•จ
  • ๋ณ‘๋ ฌ ์‹œ์Šคํ…œ์—์„œ๋Š” network ๋น„์šฉ ๊ณ ๋ ค ํ•„์š”
  • ๊ฐ ์—ฐ์‚ฐ์˜ ๋น„์šฉ ์ถ”์ • ๋ฐฉ๋ฒ• ์„ค๋ช…
  • output์„ disk์— ์“ฐ๋Š” ๋น„์šฉ์€ ํฌํ•จํ•˜์ง€ ์•Š์Œ (์—ฐ์‚ฐ์˜ output์ด disk์— ์“ฐ์ด์ง€ ์•Š๊ณ  ๋‹ค์Œ ์—ฐ์‚ฐ์œผ๋กœ ์ „์†ก๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ)
  • Disk ๋น„์šฉ ์ถ”์ •
    • Number of seeks ร— average-seek-cost
    • Number of blocks read ร— average-block-read-cost
    • Number of blocks written ร— average-block-write-cost
  • ๋‹จ์ˆœํ™”๋ฅผ ์œ„ํ•ด disk๋กœ๋ถ€ํ„ฐ์˜ block ์ „์†ก ํšŸ์ˆ˜์™€ seek ํšŸ์ˆ˜๋งŒ ๋น„์šฉ ์ฒ™๋„๋กœ ์‚ฌ์šฉ
  • tTt_TtTโ€‹ โ€“ ํ•˜๋‚˜์˜ block์„ ์ „์†กํ•˜๋Š” ์‹œ๊ฐ„
    • ๋‹จ์ˆœํ™”๋ฅผ ์œ„ํ•ด write ๋น„์šฉ์€ read ๋น„์šฉ๊ณผ ๊ฐ™๋‹ค๊ณ  ๊ฐ€์ •
  • tSt_StSโ€‹ โ€“ ํ•œ ๋ฒˆ์˜ seek์— ๊ฑธ๋ฆฌ๋Š” ์‹œ๊ฐ„
  • bbb block ์ „์†ก๊ณผ SSS seek์— ๋Œ€ํ•œ ๋น„์šฉ: bโˆ—tT+Sโˆ—tSb * t_T + S * t_Sbโˆ—tTโ€‹+Sโˆ—tSโ€‹
  • tSt_StSโ€‹์™€ tTt_TtTโ€‹๋Š” ๋ฐ์ดํ„ฐ ์ €์žฅ ์œ„์น˜์— ๋”ฐ๋ผ ๋‹ค๋ฆ„; 4 KB block ๊ธฐ์ค€,
    • High end magnetic disk: tS=4ย msect_S = 4~ \text{msec}tSโ€‹=4ย msec ์ด๊ณ  tT=0.1ย msect_T = 0.1~ \text{msec}tTโ€‹=0.1ย msec
    • SSD: tS=20โˆผ90ย microsect_S = 20 \sim 90~ \text{microsec}tSโ€‹=20โˆผ90ย microsec ์ด๊ณ  tT=2โˆผ10ย microsect_T = 2 \sim 10~ \text{microsec}tTโ€‹=2โˆผ10ย microsec
  • ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ด๋ฏธ buffer resident์ผ ์ˆ˜ ์žˆ์–ด disk I/O๋ฅผ ํ”ผํ•  ์ˆ˜ ์žˆ์Œ
  • ๊ทธ๋Ÿฌ๋‚˜ ๋น„์šฉ ์ถ”์ • ์‹œ ๊ณ ๋ คํ•˜๊ธฐ ์–ด๋ ค์›€
  • ์—ฌ๋Ÿฌ algorithm์€ ์ถ”๊ฐ€ buffer ๊ณต๊ฐ„์„ ์‚ฌ์šฉํ•ด disk IO๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์Œ
  • buffer์— ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์‹ค์ œ memory ์–‘์€ ๋‹ค๋ฅธ ๋™์‹œ query ๋ฐ OS process์— ๋”ฐ๋ผ ๋‹ค๋ฅด๋ฉฐ, ์‹คํ–‰ ์ค‘์—๋งŒ ์•Œ ์ˆ˜ ์žˆ์Œ
  • ์ตœ์•…์˜ ๊ฒฝ์šฐ(Worst case) ์ถ”์ •์€ ์ดˆ๊ธฐ์— buffer์— ๋ฐ์ดํ„ฐ๊ฐ€ ์—†๊ณ  ์—ฐ์‚ฐ์— ํ•„์š”ํ•œ ์ตœ์†Œ memory๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ๊ฐ€์ •
  • ๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ๋กœ๋Š” ๋” ๋‚™๊ด€์ ์ธ ์ถ”์ •์ด ์‚ฌ์šฉ๋จ

Selection Operation

  • File scan (๋˜๋Š” table scan)
    • ์„ ํƒ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” record๋ฅผ ์ฐพ์•„ ๊ฒ€์ƒ‰
    • Full file scan: file (relation)์˜ ๋ชจ๋“  record ๊ฒ€์ƒ‰
  • Algorithm A1 (linear search)
    • ๊ฐ file block์„ scanํ•˜๊ณ  ๋ชจ๋“  record๊ฐ€ ์„ ํƒ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š”์ง€ ํ…Œ์ŠคํŠธ
    • ๋น„์šฉ ์ถ”์ • = brb_rbrโ€‹ block ์ „์†ก + 1 seek = brโˆ—tT+tSb_r * t_T + t_Sbrโ€‹โˆ—tTโ€‹+tSโ€‹
    • brb_rbrโ€‹์€ relation rrr์˜ record๋ฅผ ํฌํ•จํ•˜๋Š” block ์ˆ˜
    • ๋งŒ์•ฝ key attribute์— ๋Œ€ํ•œ ์„ ํƒ์ด๊ณ  ์„ ํƒ ์กฐ๊ฑด์ด ๋™๋“ฑ ๋น„๊ต(equality comparison)๋ผ๋ฉด
      • record๋ฅผ ์ฐพ๋Š” ์ฆ‰์‹œ ์ค‘๋‹จ ๊ฐ€๋Šฅ
      • ๋น„์šฉ = (br/2)(b_r / 2)(brโ€‹/2) block ์ „์†ก + 1 seek = (br/2)โˆ—tT+tS(b_r / 2) * t_T + t_S(brโ€‹/2)โˆ—tTโ€‹+tSโ€‹
    • Linear search๋Š” ๋‹ค์Œ์— ๊ด€๊ณ„์—†์ด ์ ์šฉ ๊ฐ€๋Šฅ
      • ์„ ํƒ ์กฐ๊ฑด,
      • file ๋‚ด record์˜ ์ˆœ์„œ, ๋˜๋Š”
      • index ๊ฐ€์šฉ์„ฑ
    • ์ฐธ๊ณ : Binary search๋Š”?
      • ๋ฐ์ดํ„ฐ๊ฐ€ ์—ฐ์†์ ์œผ๋กœ ์ €์žฅ๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ ์ผ๋ฐ˜์ ์œผ๋กœ ์˜๋ฏธ ์—†์Œ
      • ๋˜ํ•œ, binary search๋Š” index search๋ณด๋‹ค ๋” ๋งŽ์€ seek๋ฅผ ์š”๊ตฌ
  • Index scan
    • Index๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒ€์ƒ‰ algorithm
    • ์„ ํƒ ์กฐ๊ฑด์€ index์˜ search-key์— ์žˆ์–ด์•ผ ํ•จ
  • Algorithm A2 (clustering B+B^+B+-tree index, equality on key)
    • ํ•ด๋‹น ๋™๋“ฑ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ๋‹จ์ผ record ๊ฒ€์ƒ‰
    • ๋น„์šฉ = (hi+1)โˆ—(tT+tS)(h_i + 1) * (t_T + t_S)(hiโ€‹+1)โˆ—(tTโ€‹+tSโ€‹), ์—ฌ๊ธฐ์„œ hih_ihiโ€‹๋Š” index์˜ ๋†’์ด
  • Algorithm A3 (clustering B+B^+B+-tree index, equality on non-key)
    • ํ•ด๋‹น ๋™๋“ฑ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ์—ฌ๋Ÿฌ record ๊ฒ€์ƒ‰
    • Record๋“ค์€ ์—ฐ์†์ ์ธ block์— ์žˆ์„ ๊ฒƒ (์ฐธ๊ณ : clustering index)
    • ๋น„์šฉ = hiโˆ—(tT+tS)+tS+tTโˆ—bh_i * (t_T + t_S) + t_S + t_T * bhiโ€‹โˆ—(tTโ€‹+tSโ€‹)+tSโ€‹+tTโ€‹โˆ—b, (bbb = ์ผ์น˜ํ•˜๋Š” record๋ฅผ ํฌํ•จํ•˜๋Š” block ์ˆ˜)
  • Algorithm A4 (secondary B+B^+B+-tree index, equality on key/non-key)
    • Search-key๊ฐ€ candidate key์ธ ๊ฒฝ์šฐ ๋‹จ์ผ record ๊ฒ€์ƒ‰
    • ๋น„์šฉ = (hi+1)โˆ—(tT+tS)(h_i + 1) * (t_T + t_S)(hiโ€‹+1)โˆ—(tTโ€‹+tSโ€‹)
    • Search-key๊ฐ€ candidate key๊ฐ€ ์•„๋‹Œ ๊ฒฝ์šฐ ์—ฌ๋Ÿฌ record ๊ฒ€์ƒ‰
    • ์ผ์น˜ํ•˜๋Š” nnn๊ฐœ์˜ record ๊ฐ๊ฐ์ด ๋‹ค๋ฅธ block์— ์žˆ์„ ์ˆ˜ ์žˆ์Œ (์ฐธ๊ณ : secondary index)
    • ๋น„์šฉ = (hi+n)โˆ—(tT+tS)(h_i + n) * (t_T + t_S)(hiโ€‹+n)โˆ—(tTโ€‹+tSโ€‹) : ๋งค์šฐ ๋น„์Œ€ ์ˆ˜ ์žˆ์Œ!

Selection Involving Comparisons

  • ฯƒAโ‰คV(r)\sigma_{A \le V} (r)ฯƒAโ‰คVโ€‹(r) ๋˜๋Š” ฯƒAโ‰ฅV(r)\sigma_{A \ge V} (r)ฯƒAโ‰ฅVโ€‹(r) ํ˜•ํƒœ์˜ selection ๊ตฌํ˜„ ๋ฐฉ๋ฒ•
    • linear file scan ์‚ฌ์šฉ,
    • ๋˜๋Š” index๋ฅผ ๋‹ค์Œ ๋ฐฉ์‹์œผ๋กœ ์‚ฌ์šฉ
  • Algorithm A5 (clustering B+B^+B+-tree index, comparison)
    • Relation์ด AAA์— ๋Œ€ํ•ด ์ •๋ ฌ๋จ
    • ฯƒAโ‰ฅV(r)\sigma_{A \ge V} (r)ฯƒAโ‰ฅVโ€‹(r)์˜ ๊ฒฝ์šฐ, index๋ฅผ ์‚ฌ์šฉํ•ด โ‰ฅv\ge vโ‰ฅv์ธ ์ฒซ tuple์„ ์ฐพ๊ณ  ๊ฑฐ๊ธฐ์„œ๋ถ€ํ„ฐ relation์„ ์ˆœ์ฐจ์  scan
    • ๋น„์šฉ = hiโˆ—(tT+tS)+tS+tTโˆ—bh_i * (t_T + t_S) + t_S + t_T * bhiโ€‹โˆ—(tTโ€‹+tSโ€‹)+tSโ€‹+tTโ€‹โˆ—b (A3 ๊ฒฝ์šฐ์™€ ๋™์ผ)
    • ฯƒAโ‰คV(r)\sigma_{A \le V} (r)ฯƒAโ‰คVโ€‹(r)์˜ ๊ฒฝ์šฐ, ์ฒซ tuple >v> v>v๋ฅผ ๋งŒ๋‚  ๋•Œ๊นŒ์ง€ relation์„ ์ˆœ์ฐจ์  scan; index ๋ฏธ์‚ฌ์šฉ
    • ๋น„์šฉ = tS+tTโˆ—bt_S + t_T * btSโ€‹+tTโ€‹โˆ—b
  • Algorithm A6 (secondary B+B^+B+-tree index, comparison)
    • ฯƒAโ‰ฅV(r)\sigma_{A \ge V} (r)ฯƒAโ‰ฅVโ€‹(r)์˜ ๊ฒฝ์šฐ, index๋ฅผ ์‚ฌ์šฉํ•ด โ‰ฅv\ge vโ‰ฅv์ธ ์ฒซ index entry๋ฅผ ์ฐพ๊ณ  ๊ฑฐ๊ธฐ์„œ๋ถ€ํ„ฐ index๋ฅผ ์ˆœ์ฐจ์  scanํ•˜์—ฌ record pointer ์ฐพ๊ธฐ
    • ฯƒAโ‰คV(r)\sigma_{A \le V} (r)ฯƒAโ‰คVโ€‹(r)์˜ ๊ฒฝ์šฐ, ์ฒซ entry >v> v>v๋ฅผ ๋งŒ๋‚  ๋•Œ๊นŒ์ง€ index์˜ leaf page๋ฅผ scanํ•˜๋ฉฐ record pointer ์ฐพ๊ธฐ
    • ๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘, pointer๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” record๋ฅผ ๊ฒ€์ƒ‰
    • record ๋‹น random I/O ํ•„์š”
    • ๊ฐ€์ ธ์˜ฌ record๊ฐ€ ๋งŽ๋‹ค๋ฉด Linear file scan์ด ๋” ์ €๋ ดํ•  ์ˆ˜ ์žˆ์Œ!
  • Algorithm A6์—์„œ,
    • secondary index์™€ linear file scan ์ค‘ ์–ด๋А ๊ฒƒ์„ ์„ ํƒํ•˜๋“ , ์ผ์น˜ํ•˜๋Š” record ์ˆ˜๋ฅผ ๋ฏธ๋ฆฌ ์•Œ์ง€ ๋ชปํ•˜๋ฉด ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Œ
  • PostgreSQL์˜ bitmap index scan algorithm
    • ์ผ์น˜ํ•˜๋Š” record ์ˆ˜๋ฅผ ์‹คํ–‰ ์ „์— ์•Œ ์ˆ˜ ์—†์„ ๋•Œ secondary index scan๊ณผ linear file scan ๊ฐ„์˜ ๊ฒฉ์ฐจ๋ฅผ ํ•ด์†Œ
  • relation์˜ block ๋‹น 1 bit๋ฅผ ๊ฐ€์ง„ bitmap ์ƒ์„ฑ (๋ชจ๋“  bit๋Š” 0์œผ๋กœ ์ดˆ๊ธฐํ™”)
  • ๋‹จ๊ณ„
    • secondary index๋ฅผ ์‚ฌ์šฉํ•ด ์ผ์น˜ํ•˜๋Š” record์˜ pointer ์ฐพ๊ธฐ
    • record๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๋Œ€์‹ , bitmap์—์„œ ํ•ด๋‹น bit๋ฅผ 1๋กœ ์„ค์ •
    • bit๊ฐ€ 1๋กœ ์„ค์ •๋˜์ง€ ์•Š์€ block์€ ๊ฑด๋„ˆ๋›ฐ๋ฉด์„œ linear file scan ์ˆ˜ํ–‰, ๊ฐ€์ ธ์˜จ block ๋‚ด ๋ชจ๋“  ์ผ์น˜ record ๊ฒ€์ƒ‰
  • ์„ฑ๋Šฅ
    • bit๊ฐ€ ๋ช‡ ๊ฐœ๋งŒ ์„ค์ •๋œ ๊ฒฝ์šฐ index scan๊ณผ ์œ ์‚ฌ
    • ๋Œ€๋ถ€๋ถ„์˜ bit๊ฐ€ ์„ค์ •๋œ ๊ฒฝ์šฐ linear file scan๊ณผ ์œ ์‚ฌ
    • ์ตœ์ƒ์˜ plan์— ๋น„ํ•ด ๊ฒฐ์ฝ” ๋งค์šฐ ๋‚˜์˜๊ฒŒ ๋™์ž‘ํ•˜์ง€ ์•Š์Œ

Implementation of Complex Selections

  • Conjunction: ฯƒฮธ1โˆงฮธ2โˆง...โˆงฮธn(r)\sigma_{\theta_1 \land \theta_2 \land ... \land \theta_n} (r)ฯƒฮธ1โ€‹โˆงฮธ2โ€‹โˆง...โˆงฮธnโ€‹โ€‹(r)
  • Algorithm A7 (conjunctive selection using one index)
    • ฯƒฮธi(r)\sigma_{\theta_i} (r)ฯƒฮธiโ€‹โ€‹(r)์— ๋Œ€ํ•ด ์ตœ์†Œ ๋น„์šฉ์„ ์ดˆ๋ž˜ํ•˜๋Š” 'ํ•˜๋‚˜์˜ ฮธi\theta_iฮธiโ€‹ (1โ‰คiโ‰คn1 \le i \le n1โ‰คiโ‰คn)์™€ algorithm (A1 ~ A6)์˜ ์กฐํ•ฉ'์„ ์„ ํƒ
    • ๊ฒ€์ƒ‰๋œ ๊ฐ record๊ฐ€ ๋‚˜๋จธ์ง€ ์กฐ๊ฑด ฮธj\theta_jฮธjโ€‹ (jโ‰ ij \ne ij๎€ =i)๋ฅผ ๋งŒ์กฑํ•˜๋Š”์ง€ memory buffer์—์„œ ํ…Œ์ŠคํŠธํ•˜์—ฌ ์—ฐ์‚ฐ ์™„๋ฃŒ
    • ๋น„์šฉ์€ ์„ ํƒ๋œ algorithm์˜ ๋น„์šฉ
  • Algorithm A8 (conjunctive selection using composite index)
    • ์ผ๋ถ€ conjunctive selection์— ๋Œ€ํ•ด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ ์ ˆํ•œ composite (multiple-key) index ์‚ฌ์šฉ
  • Algorithm A9 (conjunctive selection by intersection of identifiers)
    • record pointer๊ฐ€ ์žˆ๋Š” index ํ•„์š”
    • ๊ฐ ์กฐ๊ฑด์— ๋Œ€ํ•ด ํ•ด๋‹นํ•˜๋Š” index๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ์–ป์–ด์ง„ ๋ชจ๋“  record pointer ์ง‘ํ•ฉ์˜ ๊ต์ง‘ํ•ฉ(intersection)์„ ๊ตฌํ•จ
    • ๊ทธ ๋‹ค์Œ file์—์„œ record๋ฅผ ๊ฐ€์ ธ์˜ด
    • ์ผ๋ถ€ ์กฐ๊ฑด์— ์ ์ ˆํ•œ index๊ฐ€ ์—†๋‹ค๋ฉด, memory์—์„œ ํ…Œ์ŠคํŠธ ์ ์šฉ
  • Disjunction: ฯƒฮธ1โˆจฮธ2โˆจ...โˆจฮธn(r)\sigma_{\theta_1 \lor \theta_2 \lor ... \lor \theta_n} (r)ฯƒฮธ1โ€‹โˆจฮธ2โ€‹โˆจ...โˆจฮธnโ€‹โ€‹(r)
  • Algorithm A10 (disjunctive selection by union of identifiers)
    • ๋ชจ๋“  ์กฐ๊ฑด์— ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ index๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ์ ์šฉ ๊ฐ€๋Šฅ
    • ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด linear scan ์‚ฌ์šฉ
    • ๊ฐ ์กฐ๊ฑด์— ํ•ด๋‹นํ•˜๋Š” index๋ฅผ ใ„ด์‚ฌ์šฉํ•˜๊ณ , ์–ป์–ด์ง„ ๋ชจ๋“  record pointer ์ง‘ํ•ฉ์˜ ํ•ฉ์ง‘ํ•ฉ(union)์„ ๊ตฌํ•จ
    • ๊ทธ ๋‹ค์Œ file์—์„œ record๋ฅผ ๊ฐ€์ ธ์˜ด
  • Negation: ฯƒยฌฮธ(r)\sigma_{\neg \theta} (r)ฯƒยฌฮธโ€‹(r)
    • Comparison์— ์“ฐ์ง€ ์•Š์„ ๊ฒƒ. equality์— ์‚ฌ์šฉ
    • file์— linear scan ์‚ฌ์šฉ
    • ์ผ๋ฐ˜์ ์œผ๋กœ record์˜ ์ƒ๋‹น ๋ถ€๋ถ„์ด ยฌฮธ\neg \thetaยฌฮธ๋ฅผ ๋งŒ์กฑ (์ฆ‰, ฮธ\thetaฮธ๋ฅผ ๋งŒ์กฑํ•˜์ง€ ์•Š์Œ)
    • ๋งŒ์•ฝ ยฌฮธ\neg \thetaยฌฮธ๋ฅผ ๋งŒ์กฑํ•˜๋Š” record๊ฐ€ ๋งค์šฐ ์ ๊ณ , ฮธ\thetaฮธ์— index ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด
      • index๋ฅผ ์‚ฌ์šฉํ•ด ฮธ\thetaฮธ๋ฅผ ๋งŒ์กฑํ•˜๋Š” record๋ฅผ ์ฐพ๊ณ  file์—์„œ ๊ฐ€์ ธ์˜ด

Sorting

  • Sorting์€ database ์‹œ์Šคํ…œ์—์„œ ์ค‘์š”ํ•œ ์—ญํ•  ์ˆ˜ํ–‰
  • SQL query๋Š” output์ด ์ •๋ ฌ๋˜๋„๋ก ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Œ
  • Join๊ณผ ๊ฐ™์€ ์—ฌ๋Ÿฌ relational ์—ฐ์‚ฐ์€ ์ž…๋ ฅ relation์ด ๋จผ์ € ์ •๋ ฌ๋˜๋ฉด ํšจ์œจ์ ์œผ๋กœ ๊ตฌํ˜„ ๊ฐ€๋Šฅ
  • ๋‹จ์ˆœ index-๊ธฐ๋ฐ˜ ์ •๋ ฌ
    • relation์— index๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ , index๋ฅผ ์‚ฌ์šฉํ•ด ์ •๋ ฌ๋œ ์ˆœ์„œ๋กœ relation ์ฝ๊ธฐ
    • ๊ฐ tuple์— ๋Œ€ํ•ด disk block access๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Œ (๋ฌผ๋ฆฌ์ ์ด ์•„๋‹Œ index๋ฅผ ํ†ตํ•ด ๋…ผ๋ฆฌ์ ์œผ๋กœ๋งŒ relation์„ ์ •๋ ฌํ•˜๊ธฐ ๋•Œ๋ฌธ)
      • record๋ฅผ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์ •๋ ฌํ•˜๋Š” ํšจ์œจ์ ์ธ algorithm ํ•„์š”
  • memory์— ๋งž๋Š” relation์˜ ๊ฒฝ์šฐ, quicksort ๊ฐ™์€ technique ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  • memory์— ๋งž์ง€ ์•Š๋Š” relation ์ •๋ ฌ์„ external sorting์ด๋ผ ํ•จ
  • external sort-merge: ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” external sorting algorithm

External Sort-Merge

  • MMM์„ memory ํฌ๊ธฐ (block ๋‹จ์œ„)๋ผ ๊ฐ€์ •
  1. ์ •๋ ฌ๋œ run ์ƒ์„ฑ i = 0; Repeat relation์˜ MMM block์„ memory๋กœ ์ฝ๊ธฐ; in-memory block ์ •๋ ฌ; ์ •๋ ฌ๋œ ๋ฐ์ดํ„ฐ๋ฅผ run file RiR_iRiโ€‹์— ์“ฐ๊ธฐ; i = i + 1; Until relation์˜ ๋๊นŒ์ง€ddddddddddddfsdvssssssssssssssssss ์ด run์˜ ์ˆ˜๋ฅผ NNN์ด๋ผ ํ•˜๊ณ , (์ง€๊ธˆ์€) N<MN < MN<M์ด๋ผ ๊ฐ€์ •
  2. Run ๋ณ‘ํ•ฉ (N-way merge) /* NNN๊ฐœ์˜ memory block์„ ์ž…๋ ฅ run buffer๋กœ, 1 block์„ output buffer๋กœ ์‚ฌ์šฉ / ๊ฐ run์˜ ์ฒซ block์„ ํ•ด๋‹น buffer block์œผ๋กœ ์ฝ๊ธฐ; repeat ๋ชจ๋“  buffer block ์ค‘์—์„œ ์ฒซ ๋ฒˆ์งธ record (์ •๋ ฌ ์ˆœ์„œ ๊ธฐ์ค€) ์„ ํƒ; record๋ฅผ output buffer์— ์“ฐ๊ธฐ; / output buffer๊ฐ€ ์ฐผ๋‹ค๋ฉด, disk์— ์“ฐ๊ธฐ */ ํ•ด๋‹น input buffer block์—์„œ record ์‚ญ์ œ; If buffer block์ด ๋น„๊ฒŒ ๋˜๋ฉด then run์˜ ๋‹ค์Œ block (์žˆ๋‹ค๋ฉด)์„ buffer๋กœ ์ฝ๊ธฐ; until ๋ชจ๋“  input buffer block์ด ๋นŒ ๋•Œ๊นŒ์ง€
  • ๋งŒ์•ฝ Nโ‰ฅMN \ge MNโ‰ฅM์ธ ๊ฒฝ์šฐ
    • N+1>MN+1 > MN+1>M, ์ฆ‰ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ memory ํฌ๊ธฐ(MMM)๊ฐ€ ํ•„์š”ํ•œ buffer ์ˆ˜(NNN๊ฐœ์˜ input buffer + 1๊ฐœ์˜ output buffer)๋ณด๋‹ค ์ž‘์Œ
    • ์—ฌ๋Ÿฌ merge pass๊ฐ€ ํ•„์š”
    • ๊ฐ pass์—์„œ, Mโˆ’1M - 1Mโˆ’1๊ฐœ run์˜ ์—ฐ์† ๊ทธ๋ฃน์ด ๋ณ‘ํ•ฉ๋จ
    • ํ•œ pass๋Š” run์˜ ์ˆ˜๋ฅผ Mโˆ’1M-1Mโˆ’1 factor๋กœ ์ค„์ด๊ณ , run ๊ธธ์ด๋ฅผ ๊ฐ™์€ factor๋งŒํผ ๋Š˜๋ฆผ
    • ์˜ˆ: M=11M = 11M=11์ด๊ณ  90๊ฐœ์˜ run์ด ์žˆ๋‹ค๋ฉด, ํ•œ pass๋Š” run ์ˆ˜๋ฅผ 9๊ฐœ๋กœ ์ค„์ž„ (90/(Mโˆ’1)=90/10=990 / (M-1) = 90/10 = 990/(Mโˆ’1)=90/10=9). ๊ฐ run์˜ ํฌ๊ธฐ๋Š” ์ดˆ๊ธฐ run์˜ 10๋ฐฐ(=Mโˆ’1= M-1=Mโˆ’1)๊ฐ€ ๋จ
    • ๋ชจ๋“  run์ด ํ•˜๋‚˜๋กœ ๋ณ‘ํ•ฉ๋  ๋•Œ๊นŒ์ง€ pass ๋ฐ˜๋ณต ์ˆ˜ํ–‰
  • ๋น„์šฉ ๋ถ„์„
    • brb_rbrโ€‹์„ relation rrr์˜ record๋ฅผ ํฌํ•จํ•˜๋Š” block ์ˆ˜๋ผ ํ•จ
    • ํ•„์š”ํ•œ ์ด merge pass ์ˆ˜: โŒˆlogโกMโˆ’1(br/M)โŒ‰\lceil \log_{M-1} (b_r / M) \rceilโŒˆlogMโˆ’1โ€‹(brโ€‹/M)โŒ‰ (br/Mb_r / Mbrโ€‹/M : ์ดˆ๊ธฐ run์˜ ์ˆ˜)
    • ์ดˆ๊ธฐ run ์ƒ์„ฑ ๋˜๋Š” ๊ฐ pass์—์„œ์˜ block ์ „์†ก ์ˆ˜
      • 2br2b_r2brโ€‹ (brb_rbrโ€‹ read ๋ฐ brb_rbrโ€‹ write)
    • ๋งˆ์ง€๋ง‰ pass์—์„œ๋Š” write ๋น„์šฉ ๋ฏธํฌํ•จ
    • ๋”ฐ๋ผ์„œ external sorting์˜ ์ด block ์ „์†ก ์ˆ˜ 2br+2brโŒˆlogโกMโˆ’1(br/M)โŒ‰โˆ’br=br(2โŒˆlogโกMโˆ’1(br/M)โŒ‰+1)2b_r + 2b_r \lceil \log_{M-1} (b_r / M) \rceil - b_r = b_r (2 \lceil \log_{M-1} (b_r / M) \rceil + 1)2brโ€‹+2brโ€‹โŒˆlogMโˆ’1โ€‹(brโ€‹/M)โŒ‰โˆ’brโ€‹=brโ€‹(2โŒˆlogMโˆ’1โ€‹(brโ€‹/M)โŒ‰+1)
    • merge pass ๋™์•ˆ, ๊ฐ run์„ ํ•œ ๋ฒˆ์— 1 block์”ฉ ์ฝ์œผ๋ฉด ๋งŽ์€ seek ๋ฐœ์ƒ
    • ๋Œ€์‹ , run ๋‹น bbb_bbbโ€‹ buffer block ์‚ฌ์šฉ โž” ํ•œ ๋ฒˆ์— bbb_bbbโ€‹ block read/write
    • ํ•œ pass์—์„œ โŒŠM/bbโŒ‹โˆ’1\lfloor M / b_b \rfloor - 1โŒŠM/bbโ€‹โŒ‹โˆ’1๊ฐœ์˜ run ๋ณ‘ํ•ฉ ๊ฐ€๋Šฅ
    • ํ•„์š”ํ•œ ์ด merge pass ์ˆ˜: โŒˆlogโกโŒŠM/bbโŒ‹โˆ’1(br/M)โŒ‰\lceil \log_{\lfloor M/b_b \rfloor - 1} (b_r / M) \rceilโŒˆlogโŒŠM/bbโ€‹โŒ‹โˆ’1โ€‹(brโ€‹/M)โŒ‰
    • ๋”ฐ๋ผ์„œ external sorting์˜ ์ด block ์ „์†ก ์ˆ˜ br(2โŒˆlogโกโŒŠM/bbโŒ‹โˆ’1(br/M)โŒ‰+1)b_r (2 \lceil \log_{\lfloor M/b_b \rfloor - 1} (b_r / M) \rceil + 1)brโ€‹(2โŒˆlogโŒŠM/bbโ€‹โŒ‹โˆ’1โ€‹(brโ€‹/M)โŒ‰+1)
  • Seek ๋น„์šฉ
    • run ์ƒ์„ฑ ๋™์•ˆ
      • ์ „์ฒด relation read์— โŒˆbr/MโŒ‰\lceil b_r / M \rceilโŒˆbrโ€‹/MโŒ‰ seek
      • ๊ฐ run์— writeํ•˜๊ธฐ ์œ„ํ•ด โŒˆbr/MโŒ‰\lceil b_r / M \rceilโŒˆbrโ€‹/MโŒ‰ seek
      • ์ด seek = 2โŒˆbr/MโŒ‰2 \lceil b_r / M \rceil2โŒˆbrโ€‹/MโŒ‰
    • merge ๋‹จ๊ณ„ ๋™์•ˆ
      • ๊ฐ run file์—์„œ bbb_bbbโ€‹ block read: read์— ๋Œ€ํ•œ ์ด seek โ‰ˆโŒˆbr/bbโŒ‰\approx \lceil b_r / b_b \rceilโ‰ˆโŒˆbrโ€‹/bbโ€‹โŒ‰
      • output file์— writeํ•˜๋Š” ๊ฒƒ์€ ์ˆœ์ฐจ์ ์ด์ง€๋งŒ, write๊ฐ€ read๋ฅผ ๋ฐฉํ•ดํ•  ์ˆ˜ ์žˆ์Œ
      • ๋ชจ๋“  read ์—ฐ์‚ฐ์€ head๋ฅผ ์›€์ง์ด๋ฏ€๋กœ write ์—ฐ์‚ฐ์— ๋Œ€ํ•œ seek ๋ฐœ์ƒ
      • output write์— โŒˆbr/bbโŒ‰\lceil b_r / b_b \rceilโŒˆbrโ€‹/bbโ€‹โŒ‰ seek ์ถ”๊ฐ€ ํ•„์š”
      • ๋”ฐ๋ผ์„œ, write ๋น„์šฉ์„ ๋ฌด์‹œํ•˜๋Š” ๋งˆ์ง€๋ง‰ pass๋ฅผ ์ œ์™ธํ•˜๊ณ  ๊ฐ merge pass๋งˆ๋‹ค 2โŒˆbr/bbโŒ‰2 \lceil b_r / b_b \rceil2โŒˆbrโ€‹/bbโ€‹โŒ‰ seek ํ•„์š”
    • ์ด seek ์ˆ˜ 2โŒˆbr/MโŒ‰+2โŒˆbr/bbโŒ‰(โŒˆlogโกโŒŠM/bbโŒ‹โˆ’1(br/M)โŒ‰)โˆ’โŒˆbr/bbโŒ‰2 \lceil b_r / M \rceil + 2 \lceil b_r / b_b \rceil (\lceil \log_{\lfloor M/b_b \rfloor - 1} (b_r / M) \rceil) - \lceil b_r / b_b \rceil2โŒˆbrโ€‹/MโŒ‰+2โŒˆbrโ€‹/bbโ€‹โŒ‰(โŒˆlogโŒŠM/bbโ€‹โŒ‹โˆ’1โ€‹(brโ€‹/M)โŒ‰)โˆ’โŒˆbrโ€‹/bbโ€‹โŒ‰=2โŒˆbr/MโŒ‰+โŒˆbr/bbโŒ‰(2โŒˆlogโกโŒŠM/bbโŒ‹โˆ’1(br/M)โŒ‰โˆ’1)= 2 \lceil b_r / M \rceil + \lceil b_r / b_b \rceil (2 \lceil \log_{\lfloor M/b_b \rfloor - 1} (b_r / M) \rceil - 1)=2โŒˆbrโ€‹/MโŒ‰+โŒˆbrโ€‹/bbโ€‹โŒ‰(2โŒˆlogโŒŠM/bbโ€‹โŒ‹โˆ’1โ€‹(brโ€‹/M)โŒ‰โˆ’1)

Join Operation

  • Join์„ ๊ตฌํ˜„ํ•˜๋Š” ์—ฌ๋Ÿฌ ๋‹ค๋ฅธ algorithm
    • Nested-loop join
    • Block nested-loop join
    • Indexed nested-loop join
    • Merge-join
    • Hash-join
  • ๋น„์šฉ ์ถ”์ •์— ๊ธฐ๋ฐ˜ํ•œ ์„ ํƒ
  • ์˜ˆ์ œ๋Š” ๋‹ค์Œ ์ •๋ณด ์‚ฌ์šฉ
    • student record ์ˆ˜: 5,000 / takes record ์ˆ˜: 10,000
    • student block ์ˆ˜: 100 / takes block ์ˆ˜: 400

Nested-Loop Join

  • theta join rโ‹ˆฮธsr \Join_{\theta} srโ‹ˆฮธโ€‹s ๊ณ„์‚ฐ for rrr์˜ ๊ฐ tuple trt_rtrโ€‹ do begin for sss์˜ ๊ฐ tuple tst_stsโ€‹ do begin pair (tr,ย ts)(t_r, ~ t_s)(trโ€‹,ย tsโ€‹)๊ฐ€ join ์กฐ๊ฑด ฮธ\thetaฮธ๋ฅผ ๋งŒ์กฑํ•˜๋Š”์ง€ ํ…Œ์ŠคํŠธ if ๋งŒ์กฑํ•˜๋ฉด, trโ‹…tst_r \cdot t_strโ€‹โ‹…tsโ€‹๋ฅผ ๊ฒฐ๊ณผ์— ์ถ”๊ฐ€; end end
  • rrr์„ outer relation, sss๋ฅผ inner relation์ด๋ผ ํ•จ
  • trโ‹…tst_r \cdot t_strโ€‹โ‹…tsโ€‹๋Š” ๋‘ tuple์˜ attribute ๊ฐ’์„ ์—ฐ๊ฒฐ(concatenating)ํ•˜์—ฌ ๊ตฌ์„ฑ๋œ tuple
  • index๊ฐ€ ํ•„์š” ์—†์œผ๋ฉฐ ์–ด๋–ค ์ข…๋ฅ˜์˜ join ์กฐ๊ฑด๊ณผ๋„ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  • ๋‘ relation์˜ ๋ชจ๋“  tuple pair๋ฅผ ๊ฒ€์‚ฌํ•˜๋ฏ€๋กœ ๋น„์Œˆ
  • nrn_rnrโ€‹๊ณผ nsn_snsโ€‹๋ฅผ ๊ฐ๊ฐ relation rrr๊ณผ sss์˜ record ์ˆ˜๋ผ ํ•จ
  • ๋น„์šฉ ์ถ”์ •
    • ์ตœ์•…์˜ ๊ฒฝ์šฐ(Worst case): ๊ฐ relation์˜ block ํ•˜๋‚˜๋งŒ memory์— ์œ ์ง€ ๊ฐ€๋Šฅ
      • ์ถ”์ • ๋น„์šฉ: (nrโˆ—bs+br)(n_r * b_s + b_r)(nrโ€‹โˆ—bsโ€‹+brโ€‹) block ์ „์†ก ๋ฐ (nr+br)(n_r + b_r)(nrโ€‹+brโ€‹) seek
    • ์ตœ์ƒ์˜ ๊ฒฝ์šฐ(Best case): ๋‘ relation์ด ๋™์‹œ์— memory์— ๋งž์„ ๋งŒํผ ์ถฉ๋ถ„ํ•œ memory ๊ณต๊ฐ„
      • ์ถ”์ • ๋น„์šฉ: br+bsb_r + b_sbrโ€‹+bsโ€‹ block ์ „์†ก ๋ฐ 2 seek
    • ๋” ์ž‘์€ relation๋งŒ memory์— ๋งž๋Š”๋‹ค๋ฉด, ๊ทธ๊ฒƒ์„ inner relation์œผ๋กœ ์‚ฌ์šฉ
      • ์ถ”์ • ๋น„์šฉ: br+bsb_r + b_sbrโ€‹+bsโ€‹ block ์ „์†ก ๋ฐ 2 seek (์ตœ์ƒ์˜ ๊ฒฝ์šฐ์™€ ๋™์ผ)
  • ์ตœ์•…์˜ memory ๊ฐ€์šฉ์„ฑ์„ ๊ฐ€์ •ํ•  ๋•Œ, ๋น„์šฉ ์ถ”์ •์€
    • student๋ฅผ outer relation์œผ๋กœ
      • 5000 * 400 + 100 = 2,000,100 block ์ „์†ก,
      • 5000 + 100 = 5100 seek
    • takes๋ฅผ outer relation์œผ๋กœ
      • 10000 * 100 + 400 = 1,000,400 block ์ „์†ก ๋ฐ 10,400 seek
  • ๋” ์ž‘์€ relation (student)์ด memory์— ์™„์ „ํžˆ ๋งž๋Š”๋‹ค๋ฉด, ๋น„์šฉ ์ถ”์ •์€ 500 block ์ „์†ก
  • Block nested-loops algorithm (๋‹ค์Œ slide)์ด ์„ ํ˜ธ๋จ

Block Nested-Loop Join

  • inner relation์˜ ๋ชจ๋“  block์ด outer relation์˜ ๋ชจ๋“  block๊ณผ pair๋ฅผ ์ด๋ฃจ๋Š” nested-loop join์˜ ๋ณ€ํ˜•
for $r$์˜ ๊ฐ block $B_r$ do begin
    for $s$์˜ ๊ฐ block $B_s$ do begin
        for $B_r$์˜ ๊ฐ tuple $t_r$ do begin
            for $B_s$์˜ ๊ฐ tuple $t_s$ do begin
                $(t_r, ~ t_s)$๊ฐ€ join ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š”์ง€ ํ™•์ธ
                if ๋งŒ์กฑํ•˜๋ฉด, $t_r \cdot t_s$๋ฅผ ๊ฒฐ๊ณผ์— ์ถ”๊ฐ€;
            end
        end
    end
end
  • ๋น„์šฉ ์ถ”์ •
    • ์ตœ์•…์˜ ๊ฒฝ์šฐ ์ถ”์ •: brโˆ—bs+brb_r * b_s + b_rbrโ€‹โˆ—bsโ€‹+brโ€‹ block ์ „์†ก, 2br2b_r2brโ€‹ seek
    • inner relation sss์˜ ๊ฐ block์€ outer relation์˜ ๊ฐ block์— ๋Œ€ํ•ด ํ•œ ๋ฒˆ์”ฉ ์ฝํž˜
    • student โ‹ˆ\Joinโ‹ˆ takes์˜ ๊ฒฝ์šฐ, 40100 block ์ „์†ก ๋ฐ 200 seek
    • ์ตœ์ƒ์˜ ๊ฒฝ์šฐ: br+bsb_r + b_sbrโ€‹+bsโ€‹ block ์ „์†ก + 2 seek
  • nested loop ๋ฐ block nested loop algorithm ๊ฐœ์„ 
    • Block nested-loop์—์„œ,
    • outer relation์„ ์œ„ํ•œ blocking unit์œผ๋กœ Mโˆ’2M - 2Mโˆ’2 (MMM = memory ํฌ๊ธฐ block ์ˆ˜) ์‚ฌ์šฉ
    • ๋‚˜๋จธ์ง€ ๋‘ block์€ inner relation๊ณผ output์„ bufferํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ
    • ํ•œ ๋ฒˆ์— outer relation์˜ Mโˆ’2M - 2Mโˆ’2 block ์ฝ๊ธฐ
    • ๊ฐ inner relation block์— ๋Œ€ํ•ด, outer relation์˜ Mโˆ’2M - 2Mโˆ’2 block ๋ชจ๋‘์™€ join
    • ๋น„์šฉ = (โŒˆbr/(Mโˆ’2)โŒ‰โˆ—bs+br)(\lceil b_r / (M-2) \rceil * b_s + b_r)(โŒˆbrโ€‹/(Mโˆ’2)โŒ‰โˆ—bsโ€‹+brโ€‹) block ์ „์†ก + 2โŒˆbr/(Mโˆ’2)โŒ‰2 \lceil b_r / (M-2) \rceil2โŒˆbrโ€‹/(Mโˆ’2)โŒ‰ seek
    • equi-join attribute๊ฐ€ inner relation์˜ key๋ฅผ ํ˜•์„ฑํ•˜๋ฉด, ์ฒซ ๋ฒˆ์งธ ์ผ์น˜์—์„œ inner loop ์ค‘๋‹จ
    • buffer์— ๋‚จ์•„์žˆ๋Š” block์„ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด inner loop๋ฅผ ์•ž๋’ค๋กœ ๋ฒˆ๊ฐˆ์•„ scan (LRU ๊ต์ฒด ๋ฐฉ์‹)
    • ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด inner relation์˜ index ์‚ฌ์šฉ

Indexed Nested-Loop Join

  • nested-loop join์—์„œ, ๋‹ค์Œ ์กฐ๊ฑด ๋งŒ์กฑ ์‹œ index lookup์ด file scan์„ ๋Œ€์ฒด ๊ฐ€๋Šฅ
    • join์ด equi-join ๋˜๋Š” natural join์ด๊ณ ,
    • inner relation์˜ join attribute์— index๊ฐ€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ
  • join ๊ณ„์‚ฐ๋งŒ์„ ์œ„ํ•ด index ๊ตฌ์ถ• ๊ฐ€๋Šฅ
  • outer relation rrr์˜ ๊ฐ tuple trt_rtrโ€‹์— ๋Œ€ํ•ด, index๋ฅผ ์‚ฌ์šฉํ•ด trt_rtrโ€‹ tuple๊ณผ join ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” sss์˜ tuple lookup
  • ์ตœ์•…์˜ ๊ฒฝ์šฐ
    • Buffer๋Š” rrr์˜ block ํ•˜๋‚˜ ๊ณต๊ฐ„๋งŒ ์žˆ๊ณ , rrr์˜ ๊ฐ tuple์— ๋Œ€ํ•ด sss์—์„œ index lookup ์ˆ˜ํ–‰
    • Join ๋น„์šฉ: br(tT+tS)+nrโˆ—cb_r (t_T + t_S) + n_r * cbrโ€‹(tTโ€‹+tSโ€‹)+nrโ€‹โˆ—c
    • ccc๋Š” rrr์˜ ํ•œ tuple์— ๋Œ€ํ•ด index๋ฅผ ์ˆœํšŒํ•˜๊ณ  ๋ชจ๋“  ์ผ์น˜ํ•˜๋Š” sss tuple์„ ๊ฐ€์ ธ์˜ค๋Š” ๋น„์šฉ
    • ccc๋Š” join ์กฐ๊ฑด์„ ์‚ฌ์šฉํ•œ sss์— ๋Œ€ํ•œ ๋‹จ์ผ selection ๋น„์šฉ์œผ๋กœ ์ถ”์ • ๊ฐ€๋Šฅ
  • rrr๊ณผ sss ๋ชจ๋‘์˜ join attribute์— index๊ฐ€ ์žˆ๋‹ค๋ฉด, tuple์ด ๋” ์ ์€ relation์„ outer relation์œผ๋กœ ์‚ฌ์šฉ

Example of Nested-Loop Join Costs

  • student โ‹ˆ\Joinโ‹ˆ takes ๊ณ„์‚ฐ, student๋ฅผ outer relation์œผ๋กœ
  • ๊ณตํ†ต attribute: ID
  • takes๊ฐ€ ID attribute์— primary B+B^+B+-tree index๋ฅผ ๊ฐ€์ง„๋‹ค๊ณ  ๊ฐ€์ •, ๊ฐ index node๋Š” 20๊ฐœ entry ํฌํ•จ
  • takes๋Š” 10,000๊ฐœ tuple์„ ๊ฐ€์ง€๋ฏ€๋กœ, tree ๋†’์ด๋Š” 4, ์‹ค์ œ ๋ฐ์ดํ„ฐ ์ฐพ๋Š” ๋ฐ 1๋ฒˆ์˜ access ์ถ”๊ฐ€ ํ•„์š”
  • Block nested-loop join ๋น„์šฉ
    • 400 * 100 + 100 = 40,100 block ์ „์†ก + 2 * 100 = 200 seek
    • ์ตœ์•…์˜ memory ๊ฐ€์ •
    • memory๊ฐ€ ๋” ๋งŽ์œผ๋ฉด ํ›จ์”ฌ ์ ์„ ์ˆ˜ ์žˆ์Œ
  • Indexed nested loops join ๋น„์šฉ
    • 100 + 5000 * 5 = 25,100 block ์ „์†ก + 25,100 seek
    • (์ผ์น˜ํ•˜๋Š” ๋ชจ๋“  takes tuple์ด ๋‹จ์ผ block์— ์ €์žฅ๋œ๋‹ค๊ณ  ๊ฐ€์ • (takes relation์€ ID๋กœ ์ •๋ ฌ๋˜์–ด ์žˆ๊ณ , ๊ฐ student tuple์— ๋Œ€ํ•ด ํ‰๊ท  2๊ฐœ์˜ ์ผ์น˜ํ•˜๋Š” takes tuple ์กด์žฌ))
  • CPU ๋น„์šฉ์€ block nested loops join๋ณด๋‹ค ์ ์„ ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ

Merge Join

  • merge-join (sort-merge-join์ด๋ผ๊ณ ๋„ ํ•จ) algorithm์€ natural join๊ณผ equi-join ๊ณ„์‚ฐ์— ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  1. (์ด๋ฏธ ์ •๋ ฌ๋˜์ง€ ์•Š์•˜๋‹ค๋ฉด) ๋‘ relation์„ join attribute ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ
  2. ์ •๋ ฌ๋œ relation์„ ๋ณ‘ํ•ฉํ•˜์—ฌ join 2-1 Join ๋‹จ๊ณ„๋Š” sort-merge algorithm์˜ merge ๋‹จ๊ณ„์™€ ์œ ์‚ฌ 2-2 ์ฃผ์š” ์ฐจ์ด์ ์€ join attribute์˜ ์ค‘๋ณต ๊ฐ’ ์ฒ˜๋ฆฌ: join attribute์—์„œ ๋™์ผํ•œ ๊ฐ’์„ ๊ฐ€์ง„ ๋ชจ๋“  pair๊ฐ€ ์ผ์น˜ํ•ด์•ผ ํ•จ
  • Join ๋‹จ๊ณ„: ์‹ค์ œ merge-join algorithm
    • join attribute์—์„œ ๋™์ผํ•œ ๊ฐ’์„ ๊ฐ€์ง„ ํ•œ relation์˜ tuple ๊ทธ๋ฃน์„ memory๋กœ ์ฝ์Œ
    • ๊ทธ ๋‹ค์Œ, ๋‹ค๋ฅธ relation์˜ ํ•ด๋‹น tuple (์žˆ๋‹ค๋ฉด)์„ ์ฝ์–ด ๋“ค์ด๊ณ , ์ฝ์œผ๋ฉด์„œ process
    • ์ƒ์„ธ algorithm์€ ์ฑ… (Figure 15.7) ์ฐธ์กฐ
  • ๊ฐ block์€ ํ•œ ๋ฒˆ๋งŒ ์ฝ์œผ๋ฉด ๋จ (ํŠน์ • join attribute ๊ฐ’์— ๋Œ€ํ•œ ๋ชจ๋“  tuple์ด memory์— ๋งž๋Š”๋‹ค๊ณ  ๊ฐ€์ •)
  • ๊ฐ relation์— bbb_bbbโ€‹ buffer block์ด ํ• ๋‹น๋œ๋‹ค๊ณ  ๊ฐ€์ •ํ•  ๋•Œ, merge join ๋น„์šฉ (br+bs)(b_r + b_s)(brโ€‹+bsโ€‹) block ์ „์†ก + (โŒˆbr/bbโŒ‰+โŒˆbs/bbโŒ‰)(\lceil b_r / b_b \rceil + \lceil b_s / b_b \rceil)(โŒˆbrโ€‹/bbโ€‹โŒ‰+โŒˆbsโ€‹/bbโ€‹โŒ‰) seek
    • relation์ด ์ •๋ ฌ๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ ์ •๋ ฌ ๋น„์šฉ
  • Hybrid merge-join
    • ํ•œ relation์€ ์ •๋ ฌ๋˜์–ด ์žˆ๊ณ , ๋‹ค๋ฅธ relation์€ join attribute์— secondary B+B^+B+-tree index๊ฐ€ ์žˆ์„ ๋•Œ ์ ์šฉ ๊ฐ€๋Šฅ
    • ์ •๋ ฌ๋œ relation์„ B+B^+B+-tree์˜ leaf entry (join attribute, pointer)์™€ ๋ณ‘ํ•ฉ
    • ๊ฒฐ๊ณผ file์€ ์ •๋ ฌ๋œ relation์˜ tuple๊ณผ ์ •๋ ฌ๋˜์ง€ ์•Š์€ relation์˜ tuple ์ฃผ์†Œ๋ฅผ ํฌํ•จ
    • ์ •๋ ฌ๋˜์ง€ ์•Š์€ relation์˜ tuple ์ฃผ์†Œ ๊ธฐ์ค€์œผ๋กœ ๊ฒฐ๊ณผ ์ •๋ ฌ
    • ์ •๋ ฌ๋˜์ง€ ์•Š์€ relation์„ ๋ฌผ๋ฆฌ์  ์ฃผ์†Œ ์ˆœ์„œ๋กœ scanํ•˜๊ณ  ์ด์ „ ๊ฒฐ๊ณผ์™€ ๋ณ‘ํ•ฉํ•˜์—ฌ ์ฃผ์†Œ๋ฅผ ์‹ค์ œ tuple๋กœ ๊ต์ฒด
    • ์ˆœ์ฐจ scan์ด random lookup๋ณด๋‹ค ํšจ์œจ์ 

Hash Join

  • relation rrr๊ณผ sss ๊ฐ„์˜ equi-join ๋ฐ natural join์— ์ ์šฉ ๊ฐ€๋Šฅ
  • hash function hhh๊ฐ€ ๋‘ relation์˜ tuple์„ partitionํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋จ
  • hhh๋Š” JoinAttrs ( rrr๊ณผ sss์˜ ๊ณตํ†ต attribute) ๊ฐ’์„ {0,ย 1,ย ...,ย n}\{0, ~ 1, ~ ..., ~ n\}{0,ย 1,ย ...,ย n}์— mapping
  • r0,ย r1,ย ...,ย rnr_0, ~ r_1, ~ ..., ~ r_nr0โ€‹,ย r1โ€‹,ย ...,ย rnโ€‹์€ rrr tuple์˜ partition์„ ๋‚˜ํƒ€๋ƒ„
    • ๊ฐ tuple trโˆˆrt_r \in rtrโ€‹โˆˆr์€ i=h(tr[JoinAttrs])i = h(t_r[\text{JoinAttrs}])i=h(trโ€‹[JoinAttrs])์ธ partition rir_iriโ€‹์— ๋ฐฐ์น˜๋จ
  • s0,ย s1,ย ...,ย sns_0, ~ s_1, ~ ..., ~ s_ns0โ€‹,ย s1โ€‹,ย ...,ย snโ€‹์€ sss tuple์˜ partition์„ ๋‚˜ํƒ€๋ƒ„
    • ๊ฐ tuple tsโˆˆst_s \in stsโ€‹โˆˆs๋Š” i=h(ts[JoinAttrs])i = h(t_s[\text{JoinAttrs}])i=h(tsโ€‹[JoinAttrs])์ธ partition sis_isiโ€‹์— ๋ฐฐ์น˜๋จ
  • ์ฑ… (Figure 15.10)์—์„œ rir_iriโ€‹์™€ sis_isiโ€‹๋Š” HriH_{ri}Hriโ€‹์™€ HsiH_{si}Hsiโ€‹๋กœ, nnn์€ nhn_hnhโ€‹๋กœ ํ‘œ๊ธฐ๋จ
  • ๊ธฐ๋ณธ ์•„์ด๋””์–ด
    • ๊ฐ relation์˜ tuple์„ ๋™์ผํ•œ hash ๊ฐ’์„ ๊ฐ–๋Š” ์ง‘ํ•ฉ์œผ๋กœ partition
    • ๊ทธ๋Ÿฌ๋ฉด rir_iriโ€‹์˜ tuple์€ sis_isiโ€‹์˜ tuple๊ณผ๋งŒ ๋น„๊ตํ•˜๋ฉด ๋จ
    • rir_iriโ€‹์˜ tuple์€ sjs_jsjโ€‹ (iโ‰ ji \ne ji๎€ =j)์˜ tuple๊ณผ ๋น„๊ตํ•  ํ•„์š”๊ฐ€ ์—†์œผ๋ฉฐ, ๊ทธ ๋ฐ˜๋Œ€๋„ ๋งˆ์ฐฌ๊ฐ€์ง€

Hash Join Algorithm

  • rrr๊ณผ sss์˜ hash-join์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋จ
  1. hashing function hhh๋ฅผ ์‚ฌ์šฉํ•ด relation sss๋ฅผ partition: s={s0,ย s1,ย โ€ฆ,ย sn}s = \{s_0, ~ s_1, ~ \dots, ~ s_n\}s={s0โ€‹,ย s1โ€‹,ย โ€ฆ,ย snโ€‹}
  2. rrr๋„ ์œ ์‚ฌํ•˜๊ฒŒ partition: r={r0,ย r1,ย โ€ฆ,ย rn}r = \{r_0, ~ r_1, ~ \dots, ~ r_n\}r={r0โ€‹,ย r1โ€‹,ย โ€ฆ,ย rnโ€‹}
  3. ๊ฐ iii์— ๋Œ€ํ•ด rir_iriโ€‹์™€ sis_isiโ€‹์— ๋Œ€ํ•ด ๋ณ„๋„์˜ indexed nested-loop join ์ˆ˜ํ–‰, ๋ฐฉ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Œ (a) sis_isiโ€‹๋ฅผ memory์— loadํ•˜๊ณ  join attribute๋ฅผ ์‚ฌ์šฉํ•ด in-memory hash index ๊ตฌ์ถ•. ์ด hash index๋Š” ์ด์ „์˜ hhh์™€ ๋‹ค๋ฅธ hash function ์‚ฌ์šฉ (b) rir_iriโ€‹์˜ tuple์„ disk์—์„œ ํ•˜๋‚˜์”ฉ ์ฝ์Œ (c) ๊ฐ tuple trt_rtrโ€‹์— ๋Œ€ํ•ด, hash index๋ฅผ ์‚ฌ์šฉํ•ด sis_isiโ€‹์—์„œ ์ผ์น˜ํ•˜๋Š” ๊ฐ tuple tst_stsโ€‹๋ฅผ ์ฐพ์Œ. attribute๋“ค์˜ ์—ฐ๊ฒฐ(concatenation)์„ output
  • Relation sss๋Š” build input, rrr์€ probe input์ด๋ผ ๋ถˆ๋ฆผ
  • nnn ๊ฐ’์€ ๊ฐ sis_isiโ€‹ (์™€ ๊ทธ hash index)๊ฐ€ memory์— ๋งž๋„๋ก ์„ ํƒ๋จ
  • nnn์€ โŒˆbs/MโŒ‰โˆ—f\lceil b_s / M \rceil * fโŒˆbsโ€‹/MโŒ‰โˆ—f๋กœ ์„ ํƒ๋˜๋ฉฐ, fff๋Š” "fudge factor", (์ผ๋ฐ˜์ ์œผ๋กœ ์•ฝ 1.2)

Handling of Overflows

  • Partitioning์ด skewed ๋˜์—ˆ๋‹ค๋Š” ๊ฒƒ์€ ์ผ๋ถ€ partition์ด ๋‹ค๋ฅธ ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ์€ tuple์„ ๊ฐ–๋Š” ๊ฒฝ์šฐ
  • Hash-table overflow๋Š” sis_isiโ€‹์˜ hash index๊ฐ€ memory์— ๋งž์ง€ ์•Š์„ ๋•Œ sis_isiโ€‹ partition์—์„œ ๋ฐœ์ƒ. ์›์ธ
    • join attribute์— ๋™์ผํ•œ ๊ฐ’์„ ๊ฐ€์ง„ sss tuple์ด ๋งŽ์Œ
    • ๋‚˜์œ hash function (๋ฌด์ž‘์œ„์„ฑ, ๊ท ์ผ์„ฑ ์†์„ฑ ๋ถ€์กฑ)
  • Overflow resolution์€ build ๋‹จ๊ณ„์—์„œ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅ
    • Partition sis_isiโ€‹๋ฅผ ๋‹ค๋ฅธ hash function์„ ์‚ฌ์šฉํ•ด ์ถ”๊ฐ€ partition
    • Partition rir_iriโ€‹๋„ ์œ ์‚ฌํ•˜๊ฒŒ partition๋˜์–ด์•ผ ํ•จ
  • Overflow avoidance๋Š” build ๋‹จ๊ณ„ ์ค‘ overflow๋ฅผ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด partitioning์„ ์‹ ์ค‘ํ•˜๊ฒŒ ์ˆ˜ํ–‰
    • ์˜ˆ: build relation์„ ๋งŽ์€ ์ž‘์€ partition์œผ๋กœ ๋‚˜๋ˆˆ ๋‹ค์Œ, ์ผ๋ถ€ partition์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ฐ ๊ฒฐํ•ฉ๋œ partition์ด memory์— ๋งž๋„๋ก ํ•จ
  • ๋‘ ์ ‘๊ทผ ๋ฐฉ์‹ ๋ชจ๋‘ join attribute์— ์ค‘๋ณต์ด ๋งŽ์œผ๋ฉด ์‹คํŒจ
  • Fallback option: overflow๊ฐ€ ๋ฐœ์ƒํ•œ partition์— (indexed nested-loop join ๋Œ€์‹ ) block nested-loop join ์‚ฌ์šฉ

Complex Joins

  • Conjunctive condition์„ ์‚ฌ์šฉํ•œ Join: rโ‹ˆฮธ1โˆงฮธ2โˆง...โˆงฮธnsr \Join_{\theta_1 \land \theta_2 \land ... \land \theta_n} srโ‹ˆฮธ1โ€‹โˆงฮธ2โ€‹โˆง...โˆงฮธnโ€‹โ€‹s
    • nested-loop / block nested-loop join ์‚ฌ์šฉ, ๋˜๋Š”
    • ๋” ํšจ์œจ์ ์ธ algorithm์„ ์‚ฌ์šฉํ•ด ๋” ๊ฐ„๋‹จํ•œ join (rโ‹ˆฮธisr \Join_{\theta_i} srโ‹ˆฮธiโ€‹โ€‹s ๋“ฑ) ์ค‘ ํ•˜๋‚˜์˜ ๊ฒฐ๊ณผ ๊ณ„์‚ฐ
    • ์ตœ์ข… ๊ฒฐ๊ณผ๋Š” ๋‚จ์€ ์กฐ๊ฑด ฮธ1โˆง...โˆงฮธiโˆ’1โˆงฮธi+1โˆง...โˆงฮธn\theta_1 \land ... \land \theta_{i-1} \land \theta_{i+1} \land ... \land \theta_nฮธ1โ€‹โˆง...โˆงฮธiโˆ’1โ€‹โˆงฮธi+1โ€‹โˆง...โˆงฮธnโ€‹์„ ๋งŒ์กฑํ•˜๋Š” ์ค‘๊ฐ„ ๊ฒฐ๊ณผ์˜ tuple๋กœ ๊ตฌ์„ฑ
    • ์œ„ ์กฐ๊ฑด๋“ค์€ rโ‹ˆฮธisr \Join_{\theta_i} srโ‹ˆฮธiโ€‹โ€‹s์˜ tuple์ด ์ƒ์„ฑ๋  ๋•Œ ํ…Œ์ŠคํŠธ ๊ฐ€๋Šฅ
  • Disjunctive condition์„ ์‚ฌ์šฉํ•œ Join: rโ‹ˆฮธ1โˆจฮธ2โˆจ...โˆจฮธnsr \Join_{\theta_1 \lor \theta_2 \lor ... \lor \theta_n} srโ‹ˆฮธ1โ€‹โˆจฮธ2โ€‹โˆจ...โˆจฮธnโ€‹โ€‹s
    • nested-loop / block nested-loop join ์‚ฌ์šฉ, ๋˜๋Š”
    • ๊ฐœ๋ณ„ join rโ‹ˆฮธisr \Join_{\theta_i} srโ‹ˆฮธiโ€‹โ€‹s์˜ record๋“ค์˜ ํ•ฉ์ง‘ํ•ฉ(union)์œผ๋กœ ๊ณ„์‚ฐ (rโ‹ˆฮธ1s)โˆช(rโ‹ˆฮธ2s)โˆช...โˆช(rโ‹ˆฮธns)(r \Join_{\theta_1} s) \cup (r \Join_{\theta_2} s) \cup ... \cup (r \Join_{\theta_n} s)(rโ‹ˆฮธ1โ€‹โ€‹s)โˆช(rโ‹ˆฮธ2โ€‹โ€‹s)โˆช...โˆช(rโ‹ˆฮธnโ€‹โ€‹s)

Evaluation of Expressions

  • ์ง€๊ธˆ๊นŒ์ง€: ๊ฐœ๋ณ„ ์—ฐ์‚ฐ์„ ์œ„ํ•œ algorithm ํ•™์Šต
  • ์ „์ฒด expression tree ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๋Œ€์•ˆ
    • Materialization: ์ž…๋ ฅ์ด relation์ด๊ฑฐ๋‚˜ ์ด๋ฏธ ๊ณ„์‚ฐ๋œ expression์˜ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•˜๊ณ , disk์— materialize (์ €์žฅ). ๋ฐ˜๋ณต
    • Pipelining: ์—ฐ์‚ฐ์ด ์‹คํ–‰๋˜๋Š” ์ค‘์—๋„ tuple์„ ์ƒ์œ„(parent) ์—ฐ์‚ฐ์œผ๋กœ ์ „๋‹ฌ

Materialization

  • Materialized evaluation
    • ๊ฐ€์žฅ ๋‚ฎ์€ ์ˆ˜์ค€(lowest-level)์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์˜ ์—ฐ์‚ฐ ํ‰๊ฐ€
    • ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฅผ ์ž„์‹œ relation์— materializeํ•˜์—ฌ ๋‹ค์Œ ์ˆ˜์ค€(next-level) ์—ฐ์‚ฐ ํ‰๊ฐ€์— ์‚ฌ์šฉ
  • ์˜ˆ: ์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ, ฯƒbuilding="Watson"(department)\sigma_{\text{building} = \text{"Watson"}} (\text{department})ฯƒbuilding="Watson"โ€‹(department)๋ฅผ ๊ณ„์‚ฐ ๋ฐ ์ €์žฅ
    • ๊ทธ ๋‹ค์Œ instructor์™€์˜ join์„ ๊ณ„์‚ฐ ๋ฐ ์ €์žฅ, ๋งˆ์ง€๋ง‰์œผ๋กœ name์— ๋Œ€ํ•œ projection ๊ณ„์‚ฐ
  • Materialized evaluation์€ ํ•ญ์ƒ ์ ์šฉ ๊ฐ€๋Šฅ
  • ๊ฒฐ๊ณผ๋ฅผ disk์— ์“ฐ๊ณ  ๋‹ค์‹œ ์ฝ๋Š” ๋น„์šฉ์€ ์ƒ๋‹นํžˆ ๋†’์„ ์ˆ˜ ์žˆ์Œ
  • ์—ฐ์‚ฐ์— ๋Œ€ํ•œ ๋น„์šฉ ๊ณต์‹์€ ๊ฒฐ๊ณผ๋ฅผ disk์— ์“ฐ๋Š” ๋น„์šฉ์„ ๋ฌด์‹œํ•˜๋ฏ€๋กœ,
    • ์ „์ฒด ๋น„์šฉ = ๊ฐœ๋ณ„ ์—ฐ์‚ฐ ๋น„์šฉ์˜ ํ•ฉ + ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฅผ disk์— ์“ฐ๋Š” ๋น„์šฉ
  • Double buffering: ๊ฐ ์—ฐ์‚ฐ์— ๋‘ ๊ฐœ์˜ output buffer ์‚ฌ์šฉ, ํ•˜๋‚˜๊ฐ€ ์ฐจ๋ฉด disk์— ์“ฐ๋Š” ๋™์•ˆ ๋‹ค๋ฅธ ํ•˜๋‚˜ ์ฑ„์šฐ๊ธฐ
    • Disk write์™€ ๊ณ„์‚ฐ์˜ overlap ํ—ˆ์šฉ, ์‹คํ–‰ ์‹œ๊ฐ„ ๋‹จ์ถ•

Pipelining

  • Pipelined evaluation
    • ์—ฌ๋Ÿฌ ์—ฐ์‚ฐ์„ ๋™์‹œ์— ํ‰๊ฐ€,
    • ํ•œ ์—ฐ์‚ฐ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‹ค์Œ ์—ฐ์‚ฐ์œผ๋กœ ์ „๋‹ฌ
  • ์˜ˆ: ์ด์ „ expression tree์—์„œ, ฯƒbuilding="Watson"(department)\sigma_{\text{building} = \text{"Watson"}} (\text{department})ฯƒbuilding="Watson"โ€‹(department)์˜ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•˜์ง€ ์•Š์Œ
    • ๋Œ€์‹ , tuple์„ join์œผ๋กœ ์ง์ ‘ ์ „๋‹ฌ
    • ์œ ์‚ฌํ•˜๊ฒŒ, join์˜ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•˜์ง€ ์•Š๊ณ , tuple์„ projection์œผ๋กœ ์ง์ ‘ ์ „๋‹ฌ
  • Materialization๋ณด๋‹ค ํ›จ์”ฌ ์ €๋ ด
    • ์ž„์‹œ relation์„ disk์— ์ €์žฅํ•  ํ•„์š” ์—†์Œ
  • Pipelining์ด ํšจ๊ณผ์ ์ด๋ ค๋ฉด, tuple์ด ์ƒ์œ„(upper) ์—ฐ์‚ฐ์— input๋  ๋•Œ output tuple์„ ์ƒ์„ฑํ•˜๋Š” ํ•˜์œ„(lower) ์—ฐ์‚ฐ evaluation algorithm ์‚ฌ์šฉ
  • Pipeline์€ ๋‘ ๊ฐ€์ง€ ๋ฐฉ์‹์œผ๋กœ ์‹คํ–‰ ๊ฐ€๋Šฅ
    • Demand driven pipeline
    • Producer driven pipeline
  • Demand driven pipeline (๋˜๋Š” lazy evaluation)
    • System์ด ์ตœ์ƒ์œ„(top level) ์—ฐ์‚ฐ์— ๋‹ค์Œ tuple์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์š”์ฒญ
    • ๊ฐ ์—ฐ์‚ฐ์€ ๋‹ค์Œ output tuple์„ ์ถœ๋ ฅํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”์— ๋”ฐ๋ผ ํ•˜์œ„(children) ์—ฐ์‚ฐ์— ๋‹ค์Œ tuple ์š”์ฒญ
    • ํ˜ธ์ถœ ์‚ฌ์ด์—, ์—ฐ์‚ฐ์€ ๋‹ค์Œ์— ๋ฐ˜ํ™˜ํ•  ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋„๋ก "์ƒํƒœ(state)" ์œ ์ง€ ํ•„์š”
  • Producer-driven pipeline (๋˜๋Š” eager pipelining)
    • ์—ฐ์‚ฐ์ž(Operator)๊ฐ€ tuple์„ ์ ๊ทน์ ์œผ๋กœ(eagerly) ์ƒ์„ฑํ•˜๊ณ  ์ƒ์œ„(parent)๋กœ ์ „๋‹ฌ
    • ์—ฐ์‚ฐ์ž ์‚ฌ์ด์— buffer ์œ ์ง€: ํ•˜์œ„ ์—ฐ์‚ฐ์ž๋Š” buffer์— tuple์„ ๋„ฃ๊ณ , ์ƒ์œ„ ์—ฐ์‚ฐ์ž๋Š” buffer์—์„œ tuple ์ œ๊ฑฐ
    • buffer๊ฐ€ ๊ฐ€๋“ ์ฐจ๋ฉด, ํ•˜์œ„ ์—ฐ์‚ฐ์ž๋Š” buffer ๊ณต๊ฐ„์ด ์ƒ๊ธธ ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ ธ๋‹ค๊ฐ€ ๋” ๋งŽ์€ tuple ์ƒ์„ฑ
    • System์€ output buffer์— ๊ณต๊ฐ„์ด ์žˆ์–ด ๋” ๋งŽ์€ input tuple์„ processํ•  ์ˆ˜ ์žˆ๋Š” ์—ฐ์‚ฐ๋“ค์„ schedule (์‹คํ–‰)
  • ๋Œ€์•ˆ์  ๋ช…์นญ: pull ๋ฐ push model์˜ pipelining
  • Demand-driven pipelining ๊ตฌํ˜„
    • ๊ฐ ์—ฐ์‚ฐ์€ ๋‹ค์Œ ์—ฐ์‚ฐ์„ ๊ตฌํ˜„ํ•˜๋Š” iterator๋กœ ๊ตฌํ˜„๋จ
    • open()
      • ์˜ˆ: file scan์˜ ๊ฒฝ์šฐ: file scan ์ดˆ๊ธฐํ™”
      • state: file ์‹œ์ž‘ pointer
      • ์˜ˆ: merge join์˜ ๊ฒฝ์šฐ: relation ์ •๋ ฌ;
      • state: ์ •๋ ฌ๋œ relation ์‹œ์ž‘ pointer
    • next()
      • ์˜ˆ: file scan์˜ ๊ฒฝ์šฐ: ๋‹ค์Œ tuple output, file pointer ์ด๋™ ๋ฐ ์ €์žฅ
      • ์˜ˆ: merge join์˜ ๊ฒฝ์šฐ: ๋‹ค์Œ output tuple์ด ๋ฐœ๊ฒฌ๋  ๋•Œ๊นŒ์ง€ ์ด์ „ state์—์„œ merge ๊ณ„์†. pointer๋ฅผ iterator state๋กœ ์ €์žฅ
    • close()

Pipelining: Blocking Operations

  • Blocking operation: ๋ชจ๋“  input์ด ์†Œ๋น„๋  ๋•Œ๊นŒ์ง€ output์„ ์ƒ์„ฑํ•  ์ˆ˜ ์—†๋Š” ์—ฐ์‚ฐ
  • ์˜ˆ: sorting, aggregation, โ€ฆ
  • ๊ทธ๋Ÿฌ๋‚˜ ์ข…์ข… tuple์ด ์ƒ์„ฑ๋  ๋•Œ ์†Œ๋น„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, tuple์ด ์ƒ์„ฑ๋  ๋•Œ consumer์—๊ฒŒ outputํ•  ์ˆ˜ ์žˆ์Œ
  • ํ•ต์‹ฌ ์•„์ด๋””์–ด
    • Blocking operation์€ ์ข…์ข… ๋‘ ๊ฐœ์˜ ํ•˜์œ„ ์—ฐ์‚ฐ (๋‹จ๊ณ„)์„ ๊ฐ€์ง, ์˜ˆ
    • Sort-merge: run generation ๋ฐ merge
    • Hash join: partitioning ๋ฐ build-probe
    • Blocking์€ ์‹ค์ œ๋กœ ๋‘ ๋‹จ๊ณ„ ์‚ฌ์ด์—์„œ ๋ฐœ์ƒ โž” ์ด๋“ค์„ ๋ณ„๋„ ์—ฐ์‚ฐ์œผ๋กœ ์ทจ๊ธ‰
  • ํ•œ ๋‹จ๊ณ„(stage)์˜ ๋ชจ๋“  ์—ฐ์‚ฐ์€ ๋™์‹œ์— ์‹คํ–‰
  • ํ•œ ๋‹จ๊ณ„๋Š” ์ด์ „ ๋‹จ๊ณ„(stage)๊ฐ€ ์‹คํ–‰์„ ์™„๋ฃŒํ•œ ํ›„์—๋งŒ ์‹œ์ž‘ ๊ฐ€๋Šฅ
์ตœ๊ทผ ์ˆ˜์ •: 25. 11. 6. ์˜คํ›„ 12:07
Contributors: kmbzn
Prev
14. Indexing

BUILT WITH

CloudflareNode.jsGitHubGitVue.jsJavaScriptVSCodenpm

All trademarks and logos are property of their respective owners.
ยฉ 2025 kmbzn ยท MIT License