• Mindscape ๐Ÿ”ฅ
    • Playlist ๐ŸŽง
  • ๐Ÿค– Artifical Intelligence

    • 1. Basics; Linear Algebra
    • 2. Basics; Linear Algebra (2), Search (1)
    • 3. Search (2)
    • 4. Knowledge and Logic (1)
    • 5. Knowledge and Logic (2)
    • 6. Probability
    • 7. Information Theory
    • 8. Probabilitc Reasoning (2)
    • 9. Probabilitc Reasoning (3)
    • 10. Machine Learning (1)
    • 11. Machine Learning (2)
    • 12. Machine Learning (3)
    • 13. Linear Models
    • 14. Other Classic ML Models (1)
    • 15. Other Classic ML Models (2)
  • ๐Ÿ”’ Computer Security

    • 01. Overview
    • 02. ์ •๋ณด๋ณด์•ˆ์ •์ฑ… ๋ฐ ๋ฒ•๊ทœ
    • 03. Cryptographic Tools
    • 04. User Authentication
    • 05. Access Control
    • 06. Database Security
    • 07. Malicious Software
    • 08. Firmware Analysis
  • ๐Ÿ—„๏ธ Database System

    • 1. Introduction
    • 2. Relational Model
    • 3. SQL
    • 6. E-R Model
    • 7. Relational Database Design (1)
    • 7. Relational Database Design (2)
    • 13. Data Storage Structures
    • 14. Indexing
    • 15. Query Processing
  • ๐Ÿ“ Software Engineering

    • 2. Introduction to Software Engineering
    • 3. Process
    • 4. Process Models
    • 5. Agile
    • 6. Requirements
    • 7. Requirements Elicitation and Documentation
    • 8. Architecture
    • 9. Unified Modelling Language
    • 10. Object-Oriented Analysis
    • Object-Oriented Design
  • ๐Ÿง  Algorithm

    • Python ์‹œ๊ฐ„ ์ดˆ๊ณผ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ํŒ
    • C++ std::vector ์‚ฌ์šฉ๋ฒ• ์ •๋ฆฌ
    • Vim ์‚ฌ์šฉ ๋งค๋‰ด์–ผ
    • 1018๋ฒˆ: ์ฒด์ŠคํŒ ๋‹ค์‹œ ์น ํ•˜๊ธฐ
    • 1966๋ฒˆ: ํ”„๋ฆฐํ„ฐ ํ

14. Indexing

  • Indexing(์ธ๋ฑ์‹ฑ): ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ ‘๊ทผ ์†๋„๋ฅผ ๋†’์ด๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜
  • Search Key(๊ฒ€์ƒ‰ ํ‚ค): ํŒŒ์ผ์—์„œ ๋ ˆ์ฝ”๋“œ๋ฅผ ์ฐพ๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์†์„ฑ ๋˜๋Š” ์†์„ฑ ์ง‘ํ•ฉ
  • Index file(์ธ๋ฑ์Šค ํŒŒ์ผ)์€ search-key์™€ pointer(s)(Blockย pointer,ย offset\text{Block pointer, offset}Blockย pointer,ย offset) ํ˜•ํƒœ์˜ ๋ ˆ์ฝ”๋“œ(index entries(์ธ๋ฑ์Šค ์—”ํŠธ๋ฆฌ)๋ผ ๋ถˆ๋ฆผ)๋กœ ๊ตฌ์„ฑ
  • ์ธ๋ฑ์Šค ํŒŒ์ผ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์›๋ณธ ๋ฐ์ดํ„ฐ ํŒŒ์ผ๋ณด๋‹ค ํ›จ์”ฌ ์ž‘์Œ
  • ๋‘ ๊ฐ€์ง€ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค ์ข…๋ฅ˜
    • Ordered indices(์ˆœ์„œ ์ธ๋ฑ์Šค): search key๊ฐ€ ์ •๋ ฌ๋œ ์ˆœ์„œ๋กœ ์ €์žฅ
    • Hash indices(ํ•ด์‹œ ์ธ๋ฑ์Šค): search key๊ฐ€ hash function(ํ•ด์‹œ ํ•จ์ˆ˜)์„ ์‚ฌ์šฉํ•˜์—ฌ "buckets(๋ฒ„ํ‚ท)"์— ๊ท ์ผํ•˜๊ฒŒ ๋ถ„์‚ฐ
  • Index evaluation metrics(์ธ๋ฑ์Šค ํ‰๊ฐ€ ์ง€ํ‘œ)
    • ํšจ์œจ์ ์œผ๋กœ ์ง€์›๋˜๋Š” Access types(์ ‘๊ทผ ์œ ํ˜•)
      • Point queries(ํฌ์ธํŠธ ์ฟผ๋ฆฌ): search key์— ๋Œ€ํ•ด ์ง€์ •๋œ ๊ฐ’์„ ๊ฐ–๋Š” ๋ ˆ์ฝ”๋“œ
      • Range queries(๋ฒ”์œ„ ์ฟผ๋ฆฌ): search key ๊ฐ’์ด ์ง€์ •๋œ ๋ฒ”์œ„ ๋‚ด์— ์žˆ๋Š” ๋ ˆ์ฝ”๋“œ
    • ๋ฐ์ดํ„ฐ ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•œ Access/Insertion/Deletion times(์ ‘๊ทผ/์‚ฝ์ž…/์‚ญ์ œ ์‹œ๊ฐ„)
    • Space overhead(๊ณต๊ฐ„ ์˜ค๋ฒ„ํ—ค๋“œ)

Ordered Indices

  • Ordered index(์ˆœ์„œ ์ธ๋ฑ์Šค): index entries๊ฐ€ search key ๊ฐ’์— ๋”ฐ๋ผ ์ •๋ ฌ๋˜์–ด ์ €์žฅ
  • Clustering index(ํด๋Ÿฌ์Šคํ„ฐ๋ง ์ธ๋ฑ์Šค)
    • Sequentially ordered data file(์ˆœ์ฐจ์ ์œผ๋กœ ์ •๋ ฌ๋œ ๋ฐ์ดํ„ฐ ํŒŒ์ผ)์—์„œ, search key๊ฐ€ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์˜ ์ˆœ์ฐจ์  ์ˆœ์„œ๋„ ์ •์˜ํ•˜๋Š” ์ธ๋ฑ์Šค
    • Primary index(๊ธฐ๋ณธ ์ธ๋ฑ์Šค)๋ผ๊ณ ๋„ ํ•จ
    • ๊ธฐ๋ณธ ์ธ๋ฑ์Šค์˜ search key๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ primary key(๊ธฐ๋ณธ ํ‚ค)์ด์ง€๋งŒ, ํ•„์ˆ˜๋Š” ์•„๋‹˜
  • Secondary index(๋ณด์กฐ ์ธ๋ฑ์Šค)
    • search key๊ฐ€ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์˜ ์ˆœ์ฐจ์  ์ˆœ์„œ์™€ ๋‹ค๋ฅธ ์ˆœ์„œ๋ฅผ ์ง€์ •ํ•˜๋Š” ์ธ๋ฑ์Šค
    • Nonclustering index(๋น„ํด๋Ÿฌ์Šคํ„ฐ๋ง ์ธ๋ฑ์Šค)๋ผ๊ณ ๋„ ํ•จ
  • Index-sequential file(์ธ๋ฑ์Šค-์ˆœ์ฐจ ํŒŒ์ผ)
    • search key๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ๋œ Sequential data file์—, search key์— ๋Œ€ํ•œ clustering index๊ฐ€ ์žˆ๋Š” ํŒŒ์ผ

Dense Index

  • Dense index(๋ฐ€์ง‘ ์ธ๋ฑ์Šค): ๋ฐ์ดํ„ฐ ํŒŒ์ผ์˜ ๋ชจ๋“  search-key ๊ฐ’์— ๋Œ€ํ•ด index entry๊ฐ€ ๋‚˜ํƒ€๋‚จ

Sparse Index

  • Sparse Index(ํฌ์†Œ ์ธ๋ฑ์Šค): ์ผ๋ถ€ search-key ๊ฐ’์— ๋Œ€ํ•ด์„œ๋งŒ index entries๋ฅผ ํฌํ•จ
  • ๋ ˆ์ฝ”๋“œ๊ฐ€ search key๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ˆœ์ฐจ์ ์œผ๋กœ ์ •๋ ฌ๋˜์–ด ์žˆ์„ ๋•Œ ์ ์šฉ ๊ฐ€๋Šฅ
  • search-key ๊ฐ’ KKK๋ฅผ ๊ฐ€์ง„ ๋ ˆ์ฝ”๋“œ๋ฅผ ์ฐพ์œผ๋ ค๋ฉด
    • KKK๋ณด๋‹ค ์ž‘๊ฑฐ๋‚˜ ๊ฐ™์€ search-key ๊ฐ’ ์ค‘ ๊ฐ€์žฅ ํฐ ๊ฐ’์„ ๊ฐ€์ง„ index entry๋ฅผ ์ฐพ์Œ
    • ํ•ด๋‹น index record๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” ๋ ˆ์ฝ”๋“œ์—์„œ๋ถ€ํ„ฐ ํŒŒ์ผ ์ˆœ์ฐจ ๊ฒ€์ƒ‰
  • Sparse Index(Cont.)
    • Dense indices์™€ ๋น„๊ต
      • ๋ฐ์ดํ„ฐ ๋ ˆ์ฝ”๋“œ์˜ insertion(์‚ฝ์ž…) ๋ฐ deletion(์‚ญ์ œ)์— ๋Œ€ํ•ด Less space and less maintenance overhead(๋” ์ ์€ ๊ณต๊ฐ„๊ณผ ๋” ์ ์€ ์œ ์ง€ ๊ด€๋ฆฌ ์˜ค๋ฒ„ํ—ค๋“œ)
      • ์ผ๋ฐ˜์ ์œผ๋กœ ๋ ˆ์ฝ”๋“œ๋ฅผ ์ฐพ๋Š” ๋ฐ Dense index๋ณด๋‹ค slower(๋А๋ฆผ)
    • Access time(์ ‘๊ทผ ์‹œ๊ฐ„)๊ณผ space overhead(๊ณต๊ฐ„ ์˜ค๋ฒ„ํ—ค๋“œ) ์‚ฌ์ด์˜ trade-off(๊ท ํ˜•)
    • Good compromise(์ข‹์€ ์ ˆ์ถฉ์•ˆ)
      • Clustering index์˜ ๊ฒฝ์šฐ: ํŒŒ์ผ์˜ ๋ชจ๋“  block(๋ธ”๋ก)์— ๋Œ€ํ•ด index entry๋ฅผ ๊ฐ€์ง„ sparse index. ํ•ด๋‹น block์—์„œ least search-key value(๊ฐ€์žฅ ์ž‘์€ search-key ๊ฐ’)์— ํ•ด๋‹น
      • Note) Query processing(์ฟผ๋ฆฌ ์ฒ˜๋ฆฌ)์˜ ์ฃผ์š” ๋น„์šฉ์€ block I/O time(๋ธ”๋ก I/O ์‹œ๊ฐ„); ๋ฉ”๋ชจ๋ฆฌ ๋‚ด ๋ธ”๋ก ์Šค์บ๋‹ ์‹œ๊ฐ„์€ negligible(๋ฌด์‹œํ•  ์ˆ˜ ์žˆ์Œ)
      • ์ด ๋ฐฉ์‹์€ dense index์™€ ๋™์ผํ•œ ์ˆ˜์˜ block I/O๋ฅผ ๊ฐ€์ง€๋ฉด์„œ๋„ space overhead๊ฐ€ ํ›จ์”ฌ ์ ์Œ

Multilevel Index

  • Index๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ์— ๋งž์ง€ ์•Š์œผ๋ฉด, access๊ฐ€ expensive(๋น„์šฉ์ด ๋งŽ์ด ๋“ฆ)
  • Solution(ํ•ด๊ฒฐ์ฑ…): Multilevel Index(๋‹ค๋‹จ๊ณ„ ์ธ๋ฑ์Šค)
    • ๋””์Šคํฌ์— ๋ณด๊ด€๋œ ์ธ๋ฑ์Šค๋ฅผ sequential file(์ˆœ์ฐจ ํŒŒ์ผ)๋กœ ์ทจ๊ธ‰ํ•˜๊ณ  ๊ทธ ์œ„์— sparse index๋ฅผ ๊ตฌ์ถ•
    • Outer index(์™ธ๋ถ€ ์ธ๋ฑ์Šค): basic index(๊ธฐ๋ณธ ์ธ๋ฑ์Šค)์˜ sparse index
    • Inner index(๋‚ด๋ถ€ ์ธ๋ฑ์Šค): basic index file
    • Outer index์กฐ์ฐจ ๋„ˆ๋ฌด ์ปค์„œ ๋ฉ”์ธ ๋ฉ”๋ชจ๋ฆฌ์— ๋งž์ง€ ์•Š์œผ๋ฉด, ๋˜ ๋‹ค๋ฅธ level์˜ index ์ƒ์„ฑ ๊ฐ€๋Šฅ
  • ๋ชจ๋“  level์˜ indices๋Š” ํŒŒ์ผ์—์„œ์˜ insertion ๋˜๋Š” deletion ์‹œ์— updated(๊ฐฑ์‹ )๋˜์–ด์•ผ ํ•จ

Index Update: Insertion

  • Index๋Š” database modification(๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ˆ˜์ •)์— ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ๋ถ€๊ณผ
    • ๋ ˆ์ฝ”๋“œ๊ฐ€ ์‚ฝ์ž… ๋˜๋Š” ์‚ญ์ œ๋  ๋•Œ, ๊ด€๊ณ„ํ˜•์˜ ๋ชจ๋“  ์ธ๋ฑ์Šค๊ฐ€ ๊ฐฑ์‹ ๋˜์–ด์•ผ ํ•จ
    • ๋ ˆ์ฝ”๋“œ๊ฐ€ ๊ฐฑ์‹ ๋  ๋•Œ, ๊ฐฑ์‹ ๋œ ์†์„ฑ์— ๋Œ€ํ•œ ๋ชจ๋“  ์ธ๋ฑ์Šค๊ฐ€ ๊ฐฑ์‹ ๋˜์–ด์•ผ ํ•จ
  • Index update upon insertion:~1) Dense indices
    • ์‚ฝ์ž…๋˜๋Š” ๋ ˆ์ฝ”๋“œ์˜ search-key ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ lookup(์กฐํšŒ) ์ˆ˜ํ–‰
    • search-key ๊ฐ’์ด ์ธ๋ฑ์Šค์— ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์œผ๋ฉด, insert(์‚ฝ์ž…)
    • search-key ๊ฐ’์ด ์ธ๋ฑ์Šค์— ๋‚˜ํƒ€๋‚˜๋ฉด
      • ์ธ๋ฑ์Šค ์—”ํŠธ๋ฆฌ๊ฐ€ ๋™์ผํ•œ search-key ๊ฐ’์„ ๊ฐ€์ง„ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•œ pointers๋ฅผ ์ €์žฅํ•˜๋Š” ๊ฒฝ์šฐ, ์ƒˆ๋กœ์šด ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•œ pointer๋ฅผ add(์ถ”๊ฐ€)
      • ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ, ์ธ๋ฑ์Šค ์—”ํŠธ๋ฆฌ๊ฐ€ ํ•ด๋‹น search-key ๊ฐ’์„ ๊ฐ€์ง„ ์ฒซ ๋ฒˆ์งธ ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•œ pointer๋งŒ ์ €์žฅ. ์ƒˆ๋กœ์šด ๋ ˆ์ฝ”๋“œ๋ฅผ ๋™์ผํ•œ search-key ๊ฐ’์„ ๊ฐ€์ง„ ๋‹ค๋ฅธ ๋ ˆ์ฝ”๋“œ๋“ค ๋’ค์— ๋ฐฐ์น˜
    • Indices๋Š” sequential files๋กœ ์œ ์ง€ ๊ด€๋ฆฌ ๋จ โ†’\toโ†’ ์ƒˆ๋กœ์šด ์—”ํŠธ๋ฆฌ๋ฅผ ์œ„ํ•œ space๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•˜๋ฉฐ, overflow blocks(์˜ค๋ฒ„ํ”Œ๋กœ ๋ธ”๋ก)์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Œ
  • Index update upon insertion: 2) Sparse indices
    • ์‚ฝ์ž…๋˜๋Š” ๋ ˆ์ฝ”๋“œ์˜ search-key ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ lookup ์ˆ˜ํ–‰
    • ์ธ๋ฑ์Šค๊ฐ€ ํŒŒ์ผ์˜ ๊ฐ ๋ธ”๋ก์— ๋Œ€ํ•œ ์—”ํŠธ๋ฆฌ๋ฅผ ์ €์žฅํ•˜๋Š” ๊ฒฝ์šฐ, unless(๋‹ค์Œ์˜ ๊ฒฝ์šฐ๊ฐ€ ์•„๋‹ˆ๋ผ๋ฉด) ์ธ๋ฑ์Šค๋ฅผ ์ˆ˜์ •ํ•  ํ•„์š”๊ฐ€ ์—†์Œ
      • ์ƒˆ๋กœ์šด block์ด ์ƒ์„ฑ๋˜๋Š” ๊ฒฝ์šฐ(๊ธฐ์กด ๋ธ”๋ก์ด ์ด๋ฏธ ๊ฐ€๋“ ์ฐผ๊ธฐ ๋•Œ๋ฌธ): ์ƒˆ๋กœ์šด ๋ธ”๋ก์˜ first search-key value(์ฒซ ๋ฒˆ์งธ search-key ๊ฐ’)์ด ์ธ๋ฑ์Šค์— ์‚ฝ์ž…๋จ
      • ์ƒˆ๋กœ์šด ๋ ˆ์ฝ”๋“œ๊ฐ€ ๋ธ”๋ก์—์„œ least search-key value(๊ฐ€์žฅ ์ž‘์€ search-key ๊ฐ’)๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒฝ์šฐ: ๋ธ”๋ก์„ ๊ฐ€๋ฆฌํ‚ค๋Š” index entry๋ฅผ update(๊ฐฑ์‹ )

Index Update: Deletion

  • Index update upon deletion:~1) Dense indices -(Case~1) ์‚ญ์ œ๋œ ๋ ˆ์ฝ”๋“œ๊ฐ€ ํ•ด๋‹น search-key ๊ฐ’์„ ๊ฐ€์ง„ ํŒŒ์ผ ๋‚ด ์œ ์ผํ•œ ๋ ˆ์ฝ”๋“œ์ธ ๊ฒฝ์šฐ, ์ธ๋ฑ์Šค์—์„œ๋„ search-key๊ฐ€ deleted(์‚ญ์ œ)๋จ
    • ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ
      • ์ธ๋ฑ์Šค ์—”ํŠธ๋ฆฌ๊ฐ€ ๋™์ผํ•œ search-key ๊ฐ’์„ ๊ฐ€์ง„ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•œ pointers๋ฅผ ์ €์žฅํ•˜๋Š” ๊ฒฝ์šฐ, ์‚ญ์ œ๋œ ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•œ pointer๋ฅผ delete(์‚ญ์ œ)
      • ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ, ์ธ๋ฑ์Šค ์—”ํŠธ๋ฆฌ๊ฐ€ ํ•ด๋‹น search-key ๊ฐ’์„ ๊ฐ€์ง„ ์ฒซ ๋ฒˆ์งธ ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•œ pointer๋งŒ ์ €์žฅ -(Case 2) ์‚ญ์ œ๋œ ๋ ˆ์ฝ”๋“œ๊ฐ€ ํ•ด๋‹น search-key ๊ฐ’์„ ๊ฐ€์ง„ ์ฒซ ๋ฒˆ์งธ ๋ ˆ์ฝ”๋“œ์ธ ๊ฒฝ์šฐ, ์ธ๋ฑ์Šค ์—”ํŠธ๋ฆฌ๋ฅผ updateํ•˜์—ฌ ๋‹ค์Œ ๋ ˆ์ฝ”๋“œ๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋„๋ก ํ•จ -(Case 3) ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ, ์ธ๋ฑ์Šค ๊ฐฑ์‹ ์€ required(์š”๊ตฌ๋˜์ง€ ์•Š์Œ)
  • Index update upon deletion: 2) Sparse indices -(Case~1) ์ธ๋ฑ์Šค๊ฐ€ ์‚ญ์ œ๋œ ๋ ˆ์ฝ”๋“œ์˜ search-key ๊ฐ’๊ณผ ์ผ์น˜ํ•˜๋Š” index entry๋ฅผ ํฌํ•จํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ, do nothing(์•„๋ฌด๊ฒƒ๋„ ํ•˜์ง€ ์•Š์Œ)
    • ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ -(Case 2) ์‚ญ์ œ๋œ ๋ ˆ์ฝ”๋“œ๊ฐ€ ํ•ด๋‹น search-key๋ฅผ ๊ฐ€์ง„ ์œ ์ผํ•œ ๋ ˆ์ฝ”๋“œ์ธ ๊ฒฝ์šฐ, index entry๋ฅผ ํŒŒ์ผ์—์„œ next search-key value(๋‹ค์Œ search-key ๊ฐ’)๋กœ replace(๋Œ€์ฒด) - ๋‹ค์Œ search-key ๊ฐ’์ด ์ด๋ฏธ index entry๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋ฉด, ํ•ด๋‹น ์—”ํŠธ๋ฆฌ๋Š” deleted๋จ -(Primary index for non-key attribute(๋น„-ํ‚ค ์†์„ฑ์— ๋Œ€ํ•œ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค)) ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ, search-key ๊ฐ’์— ๋Œ€ํ•œ index entry๊ฐ€ ์‚ญ์ œ๋œ ๋ ˆ์ฝ”๋“œ๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š” ๊ฒฝ์šฐ, ์ธ๋ฑ์Šค ์—”ํŠธ๋ฆฌ๋ฅผ updateํ•˜์—ฌ ๋™์ผํ•œ search-key ๊ฐ’์„ ๊ฐ€์ง„ next record(๋‹ค์Œ ๋ ˆ์ฝ”๋“œ)๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋„๋ก ํ•จ

Secondary Indices: An example

  • instructor ํŒŒ์ผ์˜ salary ํ•„๋“œ์— ๋Œ€ํ•œ Secondary index(nonunique search key(๊ณ ์œ ํ•˜์ง€ ์•Š์€ search key))
  • Index record๋Š” ํ•ด๋‹น ํŠน์ • search-key ๊ฐ’์„ ๊ฐ€์ง„ ๋ชจ๋“  ์‹ค์ œ ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•œ pointers๋ฅผ ํฌํ•จํ•˜๋Š” bucket์„ ๊ฐ€๋ฆฌํ‚ด
  • Secondary indices๋Š” dense(๋ฐ€์ง‘)ํ•ด์•ผ ํ•จ
  • Index update ๊ณผ์ •์€ clustering index์˜ dense index ๊ฒฝ์šฐ์™€ ๋™์ผ
  • Secondary(nonclustering) index๋ฅผ ์‚ฌ์šฉํ•œ Sequential scan(์ˆœ์ฐจ ์Šค์บ”)์€ HDD์—์„œ expensive(๋น„์šฉ์ด ๋งŽ์ด ๋“ฆ)
    • ๊ฐ ๋ ˆ์ฝ”๋“œ ์ ‘๊ทผ ์‹œ ๋””์Šคํฌ์—์„œ ์ƒˆ๋กœ์šด block์„ ๊ฐ€์ ธ์™€์•ผ ํ•  ์ˆ˜ ์žˆ์Œ

Indices on Multiple Keys

  • Composite search key(๋ณตํ•ฉ search key): ๋‘ ๊ฐœ ์ด์ƒ์˜ ์†์„ฑ์„ ํฌํ•จํ•˜๋Š” search key
    • E.g., instructor relation์˜ ์†์„ฑ (name, ID)์— ๋Œ€ํ•œ ์ธ๋ฑ์Šค
  • ๊ฐ’์€ lexicographically(์‚ฌ์ „ ์ˆœ์œผ๋กœ) ์ •๋ ฌ๋จ
    • E.g. $ (\text{John},~12121) <(\text{John},~13514)$ ๋ฐ $ (\text{John},~13514) <(\text{Peter},~11223)$
  • name๋งŒ์œผ๋กœ ์ฟผ๋ฆฌํ•˜๊ฑฐ๋‚˜, (name, ID)๋กœ ์ฟผ๋ฆฌ ๊ฐ€๋Šฅ

B +^++ -Tree(and B-Tree) Index Files

B +^++ -Tree Index Files

  • Index-sequential file organization์˜ Disadvantage(๋‹จ์ )
    • ํŒŒ์ผ์ด ์ปค์ง์— ๋”ฐ๋ผ overflow blocks์ด ๋งŽ์ด ์ƒ์„ฑ๋˜์–ด ์„ฑ๋Šฅ ์ €ํ•˜(์ธ๋ฑ์Šค ์กฐํšŒ ๋ฐ ์ˆœ์ฐจ ์Šค์บ” ๋ชจ๋‘)
    • ๋น„์šฉ์ด ๋งŽ์ด ๋“ค๊ณ  ์ฃผ๊ธฐ์ ์ธ ์ „์ฒด ํŒŒ์ผ reorganization(์žฌ๊ตฌ์„ฑ)์ด ํ•„์š”
  • B +^++ -tree index structure์˜ Advantage(์žฅ์ )
    • Insertion(์‚ฝ์ž…) ๋ฐ deletion(์‚ญ์ œ) ์‹œ small, local changes(์ž‘๊ณ  ๊ตญ์†Œ์ ์ธ ๋ณ€๊ฒฝ)๋กœ ์ž๋™์œผ๋กœ self-reorganizes(์ž์ฒด ์žฌ๊ตฌ์„ฑ)
    • ์„ฑ๋Šฅ ์œ ์ง€๋ฅผ ์œ„ํ•ด ์ „์ฒด ํŒŒ์ผ์˜ reorganization์ด ํ•„์š”ํ•˜์ง€ ์•Š์Œ
  • B +^++ -trees์˜(Minor) disadvantage((์‚ฌ์†Œํ•œ) ๋‹จ์ )
    • Extra insertion and deletion overhead, space overhead(์ถ”๊ฐ€์ ์ธ ์‚ฝ์ž… ๋ฐ ์‚ญ์ œ ์˜ค๋ฒ„ํ—ค๋“œ, ๊ณต๊ฐ„ ์˜ค๋ฒ„ํ—ค๋“œ)
  • B +^++ -trees์˜ ์žฅ์ ์ด ๋‹จ์ ๋ณด๋‹ค ํฌ๊ธฐ ๋•Œ๋ฌธ์— extensively(๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ) ์‚ฌ์šฉ๋จ
  • B +^++ -tree๋Š” ๋‹ค์Œ ์†์„ฑ์„ ๋งŒ์กฑํ•˜๋Š” rooted tree(๋ฃจํŠธ๊ฐ€ ์žˆ๋Š” ํŠธ๋ฆฌ)
    • Root์—์„œ leaf(๋ฆฌํ”„)๊นŒ์ง€์˜ ๋ชจ๋“  ๊ฒฝ๋กœ๋Š” ๊ธธ์ด๊ฐ€ ๊ฐ™์Œ: Balanced tree(๊ท ํ˜• ํŠธ๋ฆฌ)
    • Root๋‚˜ leaf๊ฐ€ ์•„๋‹Œ ๊ฐ node(i.e., internal node(๋‚ด๋ถ€ ๋…ธ๋“œ))๋Š” โŒˆn/2โŒ‰\lceil n/2 \rceilโŒˆn/2โŒ‰์™€ nnn ์‚ฌ์ด์˜ children(์ž์‹)์„ ๊ฐ€์ง(nnn์€ ํŠน์ • ํŠธ๋ฆฌ์— ๋Œ€ํ•ด ๊ณ ์ •)
      • At least โŒˆn/2โŒ‰\lceil n/2 \rceilโŒˆn/2โŒ‰ and at most nnn children(pointers)
    • Leaf node๋Š” โŒˆ(nโˆ’1)/2โŒ‰\lceil(n-1)/2 \rceilโŒˆ(nโˆ’1)/2โŒ‰์™€ nโˆ’1n-1nโˆ’1 ์‚ฌ์ด์˜ values(๊ฐ’)์„ ๊ฐ€์ง
      • At least โŒˆ(nโˆ’1)/2โŒ‰\lceil(n-1)/2 \rceilโŒˆ(nโˆ’1)/2โŒ‰ and at most nโˆ’1n-1nโˆ’1 values(not pointers)
    • Special cases(ํŠน์ˆ˜ ๊ฒฝ์šฐ)
      • Root๊ฐ€ leaf๊ฐ€ ์•„๋‹Œ ๊ฒฝ์šฐ, at least 2 children์„ ๊ฐ€์ง
      • Root๊ฐ€ leaf์ธ ๊ฒฝ์šฐ(์ฆ‰, ํŠธ๋ฆฌ์— ๋‹ค๋ฅธ ๋…ธ๋“œ๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ), 0๊ณผ nโˆ’1n-1nโˆ’1 ์‚ฌ์ด์˜ values๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Œ
  • B +^++ -tree๋Š” multilevel index(๋‹ค๋‹จ๊ณ„ ์ธ๋ฑ์Šค)์ด์ง€๋งŒ, multilevel index-sequential file๊ณผ๋Š” ๋‹ค๋ฅธ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง

B +^++ -Tree Node Structure

  • Typical node(์ผ๋ฐ˜์ ์ธ ๋…ธ๋“œ)
    P1P_1P1โ€‹K1K_1K1โ€‹P2P_2P2โ€‹...Pn+1P_{n+1}Pn+1โ€‹Kn+1K_{n + 1}Kn+1โ€‹PnP_nPnโ€‹
    • KiK_iKiโ€‹๋Š” search-key values
    • PiP_iPiโ€‹๋Š” children(non-leaf nodes์˜ ๊ฒฝ์šฐ) ๋˜๋Š” records/buckets of records(leaf nodes์˜ ๊ฒฝ์šฐ)์— ๋Œ€ํ•œ pointers
    • Note: ์ตœ๋Œ€ nnn๊ฐœ์˜ pointers์™€ nโˆ’1n-1nโˆ’1๊ฐœ์˜ key values๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Œ
    • ๋…ธ๋“œ ๋‚ด์˜ search-keys๋Š” ordered(์ˆœ์„œ๊ฐ€ ์ง€์ •๋จ): K1<K2<K3<โ‹ฏ<Knโˆ’1K_1 < K_2 < K_3 < \dots < K_{n-1}K1โ€‹<K2โ€‹<K3โ€‹<โ‹ฏ<Knโˆ’1โ€‹ (์ดˆ๊ธฐ์—๋Š” ์ค‘๋ณต ํ‚ค๊ฐ€ ์—†๋‹ค๊ณ  ๊ฐ€์ •)

Leaf Nodes in B +^++ -Trees

  • Leaf node์˜ Properties(์†์„ฑ)
    • i=ย 1,2,โ€ฆ,nโˆ’1i =~1, 2, \dots, n-1i=ย 1,2,โ€ฆ,nโˆ’1์— ๋Œ€ํ•ด, pointer PiP_iPiโ€‹๋Š” search-key value KiK_iKiโ€‹๋ฅผ ๊ฐ€์ง„ file record๋ฅผ ๊ฐ€๋ฆฌํ‚ด
    • LiL_iLiโ€‹์™€ LjL_jLjโ€‹๊ฐ€ leaf nodes์ด๊ณ  i<ji < ji<j์ธ ๊ฒฝ์šฐ(LiL_iLiโ€‹๊ฐ€ ํŠธ๋ฆฌ์—์„œ LjL_jLjโ€‹์˜ ์™ผ์ชฝ์— ์žˆ์Œ), LiL_iLiโ€‹์˜ search-key values๋Š” LjL_jLjโ€‹์˜ search-key values๋ณด๋‹ค ์ž‘์Œ
    • PnP_nPnโ€‹์€ search-key order๋กœ next leaf node(๋‹ค์Œ leaf node)๋ฅผ ๊ฐ€๋ฆฌํ‚ด
    • ์ˆœ์ฐจ ์ฒ˜๋ฆฌ๋ฅผ ์‹ ์†ํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋“  leaf nodes๋ฅผ search-key order๋กœ Chain together(์—ฐ๊ฒฐ)
    • B +^++ -tree index๊ฐ€ dense index(์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ)๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ, ๋ชจ๋“  search-key value๊ฐ€ ์ผ๋ถ€ leaf node์— ๋‚˜ํƒ€๋‚˜์•ผ ํ•จ. ๊ทธ๋Ÿฌ๋‚˜ non-leaf node์— ๋‚˜ํƒ€๋‚˜๋Š” search-key๋Š” ๋ ˆ์ฝ”๋“œ ์‚ญ์ œ๋กœ ์ธํ•ด leaf node์— ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Œ(๋‚˜์ค‘์— ํ™•์ธ)

Non-Leaf Nodes in B +^++ -Trees

  • Non-leaf nodes๋Š” leaf nodes์— ๋Œ€ํ•œ multi-level sparse index๋ฅผ ํ˜•์„ฑ
  • nnn๊ฐœ์˜ pointers๋ฅผ ๊ฐ€์ง„ non-leaf node์˜ ๊ฒฝ์šฐ
    • P1P_1P1โ€‹์ด ๊ฐ€๋ฆฌํ‚ค๋Š” subtree์˜ All search-keys๋Š” K1K_1K1โ€‹๋ณด๋‹ค ์ž‘์Œ
    • 2โ‰คiโ‰คnโˆ’12 \leq i \leq n-12โ‰คiโ‰คnโˆ’1์— ๋Œ€ํ•ด, PiP_iPiโ€‹๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” subtree์˜ All search-keys๋Š” Kiโˆ’1K_{i-1}Kiโˆ’1โ€‹๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ ๊ฐ™๊ณ  KiK_iKiโ€‹๋ณด๋‹ค ์ž‘์Œ
    • PnP_nPnโ€‹์ด ๊ฐ€๋ฆฌํ‚ค๋Š” subtree์˜ All search-keys๋Š” Knโˆ’1K_{n-1}Knโˆ’1โ€‹๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ ๊ฐ™์Œ
  • Note: Non-leaf nodes๋Š” ๊ทธ๋“ค ์‚ฌ์ด์— duplicate search-key values๋ฅผ ๊ฐ€์ง€์ง€ ์•Š์Œ
  • General structure

Observations about B +^++ -trees

  • Inter-node connections(๋…ธ๋“œ ๊ฐ„ ์—ฐ๊ฒฐ)์ด pointers๋กœ ์ด๋ฃจ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์—, "logically(๋…ผ๋ฆฌ์ ์œผ๋กœ)" ๊ฐ€๊นŒ์šด blocks์ด "physically(๋ฌผ๋ฆฌ์ ์œผ๋กœ)" ๊ฐ€๊นŒ์šธ ํ•„์š”๋Š” ์—†์Œ
  • B +^++ -tree์˜ non-leaf levels์€ hierarchy of sparse indices(ํฌ์†Œ ์ธ๋ฑ์Šค์˜ ๊ณ„์ธต ๊ตฌ์กฐ)๋ฅผ ํ˜•์„ฑ
  • B +^++ -tree๋Š” ์ƒ๋Œ€์ ์œผ๋กœ small number of levels(์ ์€ ์ˆ˜์˜ ๋ ˆ๋ฒจ)์„ ํฌํ•จ
    • Root ์•„๋ž˜ ๋ ˆ๋ฒจ์€ at least 2โ‹…โŒˆn/2โŒ‰2 \cdot \lceil n/2 \rceil2โ‹…โŒˆn/2โŒ‰ values
    • ๋‹ค์Œ ๋ ˆ๋ฒจ์€ at least 2โ‹…โŒˆn/2โŒ‰โ‹…โŒˆn/2โŒ‰2 \cdot \lceil n/2 \rceil \cdot \lceil n/2 \rceil2โ‹…โŒˆn/2โŒ‰โ‹…โŒˆn/2โŒ‰ values
  • ํŒŒ์ผ์— KKK๊ฐœ์˜ search-key values๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ, tree height(ํŠธ๋ฆฌ ๋†’์ด)๋Š” โŒˆlogโกโŒˆn/2โŒ‰(K)โŒ‰\lceil \log_{\lceil n/2 \rceil}(K) \rceilโŒˆlogโŒˆn/2โŒ‰โ€‹(K)โŒ‰๋ฅผ ์ดˆ๊ณผํ•˜์ง€ ์•Š์Œ
    • ๋”ฐ๋ผ์„œ searches(๊ฒ€์ƒ‰)๊ฐ€ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰๋  ์ˆ˜ ์žˆ์Œ
  • Index๊ฐ€ logarithmic time(๋กœ๊ทธ ์‹œ๊ฐ„)์œผ๋กœ ์žฌ๊ตฌ์„ฑ๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, main file(๋ฉ”์ธ ํŒŒ์ผ)์— ๋Œ€ํ•œ insertions ๋ฐ deletions๋„ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌ๋  ์ˆ˜ ์žˆ์Œ

Queries on B +^++ -Trees: Point Query

function find(v)
1. Set C = root node
2. while(C is not a leaf node) begin
   Let i = smallest number s.t. v โ‰ค C.Ki
   if there is no such number i then
      /* v is larger than every key in C */
      Set C = the node pointed by the last non-null pointer in C
   else if(v = C.Ki ) Set C = C.Pi +1
   else set C = C.Pi /* v < C.Ki */
   end
   /* Now, C is a leaf node */
3. if for some i, Ki = v then return C.Pi
4. else return null /* no record with search-key value v exists */

Queries on B +^++ -Trees: Range Query

  • Range queries: ์ฃผ์–ด์ง„ ๋ฒ”์œ„ ๋‚ด์˜ search key ๊ฐ’์„ ๊ฐ€์ง„ all records(๋ชจ๋“  ๋ ˆ์ฝ”๋“œ)๋ฅผ ์ฐพ์Œ
  • function findRange(lb, ub)๋Š” lbโ‰คVโ‰คub\text{lb} \le V \le \text{ub}lbโ‰คVโ‰คub์ธ search key value VVV๋ฅผ ๊ฐ€์ง„ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ ์ง‘ํ•ฉ์„ ๋ฐ˜ํ™˜ -~1. CCC=$ \text{lb} $๊ฐ€ ๋‚˜ํƒ€๋‚  leaf node๋ฅผ ์ฐพ์Œ( find(v)์—์„œ CCC๋ฅผ ์ฐพ๋Š” ๊ฒƒ๊ณผ ๋™์ผ)
      1. iย =ย Ci~=~Ciย =ย C ์—์„œ Kiโ‰ฅlbK_i \ge \text{lb}Kiโ€‹โ‰ฅlb์ธ smallest value(๊ฐ€์žฅ ์ž‘์€ ๊ฐ’)
      1. while(Kiโ‰คubK_i \le \text{ub}Kiโ€‹โ‰คub)
      • C.PiC.P_iC.Piโ€‹๋ฅผ results์— Add
      • iย =ย iย +ย 1i~=~i~+~1iย =ย iย +ย 1 (if more records inCCC) or move to the next leaf node setting i=ย 1i =~1i=ย 1
  • Real implementations(์‹ค์ œ ๊ตฌํ˜„)์€ ์ผ๋ฐ˜์ ์œผ๋กœ next() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ผ์น˜ํ•˜๋Š” ๋ ˆ์ฝ”๋“œ๋ฅผ one at a time(ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์”ฉ) ๊ฐ€์ ธ์˜ค๋Š” iterator interface(๋ฐ˜๋ณต์ž ์ธํ„ฐํŽ˜์ด์Šค)๋ฅผ ์ œ๊ณต

Queries on B +^++ -Trees: Cost Analysis

  • ํŒŒ์ผ์— KKK๊ฐœ์˜ search-key values๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ, ํŠธ๋ฆฌ์˜ height๋Š” โŒˆlogโกโŒˆn/2โŒ‰(K)โŒ‰\lceil \log_{\lceil n/2 \rceil}(K) \rceilโŒˆlogโŒˆn/2โŒ‰โ€‹(K)โŒ‰๋ฅผ ์ดˆ๊ณผํ•˜์ง€ ์•Š์Œ
  • Node๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ disk block(๋””์Šคํฌ ๋ธ”๋ก)๊ณผ ๊ฐ™์€ ํฌ๊ธฐ, ์ผ๋ฐ˜์ ์œผ๋กœ 4KB
  • nnn์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์•ฝ~100(40 Bytes/์ธ๋ฑ์Šค ์—”ํŠธ๋ฆฌ = 32 Bytes/search key + 8 Bytes/disk block pointer)
  • Search-key size๊ฐ€~12 Bytes์ธ ๊ฒฝ์šฐ(20 Bytes/์—”ํŠธ๋ฆฌ ํฌ๊ธฐ), nnn์€ ์•ฝ 200 -~1 million search key values ๋ฐ n=100n=100n=100์ธ ๊ฒฝ์šฐ
    • Root์—์„œ leaf๊นŒ์ง€์˜ index lookup(์ธ๋ฑ์Šค ์กฐํšŒ)์— ๋Œ€ํ•ด At most โŒˆlogโก50(1,000,000)โŒ‰=4\lceil \log_{50}(1,000,000) \rceil = 4โŒˆlog50โ€‹(1,000,000)โŒ‰=4 nodes accessed(์ ‘๊ทผ) -~1 million search key values๋ฅผ ๊ฐ€์ง„ balanced binary tree(๊ท ํ˜• ์ด์ง„ ํŠธ๋ฆฌ)์™€ ๋น„๊ต: ์กฐํšŒ ์‹œ ์•ฝ 20 nodes accessed
    • ๋ชจ๋“  node access๋Š” disk I/O๋ฅผ ํ•„์š”๋กœ ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์•ฝ 20 milliseconds์˜ ๋น„์šฉ์ด ๋“ค๊ธฐ ๋•Œ๋ฌธ์— ์œ„์˜ ์ฐจ์ด๋Š” significant(์ค‘์š”)
  • Index๋ฅผ traverse(์ˆœํšŒ)ํ•œ ํ›„, ์ผ์น˜ํ•˜๋Š” ๋ ˆ์ฝ”๋“œ๋ฅผ fetch(๊ฐ€์ ธ์˜ค๊ธฐ) ์œ„ํ•ด one more(random) I/O๊ฐ€ ํ•„์š”

Non-Unique Keys

  • Search key aia_iaiโ€‹๊ฐ€ not unique(๊ณ ์œ ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ), ๋Œ€์‹  unique(๊ณ ์œ )ํ•œ composite key $ (a_i, A_{\text{pp}})$์— ๋Œ€ํ•œ ์ธ๋ฑ์Šค๋ฅผ ์ƒ์„ฑ
    • AppA_{\text{pp}}Appโ€‹๋Š” primary key, record ID ๋˜๋Š” uniqueness๋ฅผ ๋ณด์žฅํ•˜๋Š” ๊ธฐํƒ€ attribute(์†์„ฑ)์ผ ์ˆ˜ ์žˆ์Œ
  • Search for ai=va_i = vaiโ€‹=v๋Š” composite key์— ๋Œ€ํ•œ range search(๋ฒ”์œ„ ๊ฒ€์ƒ‰)์œผ๋กœ ๊ตฌํ˜„ ๊ฐ€๋Šฅ
    • Range $ (v, -\infty)tototo (v, +\infty)$
  • But more I/O operations(๋” ๋งŽ์€ I/O ์ž‘์—…)์ด ์‹ค์ œ ๋ ˆ์ฝ”๋“œ๋ฅผ fetchํ•˜๋Š” ๋ฐ ํ•„์š”
    • Index๊ฐ€ clustering์ธ ๊ฒฝ์šฐ, ๋ชจ๋“  access๋Š” sequential(์ˆœ์ฐจ์ )
    • Index๊ฐ€ non-clustering์ธ ๊ฒฝ์šฐ, ๊ฐ record access๋Š” I/O operation์„ ํ•„์š”๋กœ ํ•  ์ˆ˜ ์žˆ์Œ

Updates on B +^++ -Trees: Insertion

  • Record๊ฐ€ data file์— already added(์ด๋ฏธ ์ถ”๊ฐ€)๋˜์—ˆ๋‹ค๊ณ  ๊ฐ€์ •
    • PrP_rPrโ€‹๋ฐvvv๋Š” ๊ฐ๊ฐ record์— ๋Œ€ํ•œ pointer ๋ฐ search key value
  • search-key value๊ฐ€ ๋‚˜ํƒ€๋‚  leaf node LLL์„ Find(์ฐพ์Œ)
    1. LLL์— room(๊ณต๊ฐ„)์ด ์žˆ๋Š” ๊ฒฝ์šฐ, (v,Pr)(v, P_r)(v,Prโ€‹) ์Œ์„ LLL์— insert(Note: leaf node์—๋Š” ์ตœ๋Œ€ nโˆ’1n-1nโˆ’1 ์Œ)
    2. ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ, node๋ฅผ split(์ƒˆ๋กœ์šด $ (v, P_r)$ ์—”ํŠธ๋ฆฌ๋ฅผ ํฌํ•จํ•˜์—ฌ)
      • Splitting a leaf node LLL:
        • ์ •๋ ฌ๋œ ์ˆœ์„œ๋กœ nnn๊ฐœ์˜(search-key, pointer) ์Œ์„ ์ทจํ•จ(์‚ฝ์ž…๋˜๋Š” ์Œ ํฌํ•จ). ์ฒซ ๋ฒˆ์งธ โŒˆn/2โŒ‰\lceil n/2 \rceilโŒˆn/2โŒ‰๋ฅผ original node $ (L)$์— ๋ฐฐ์น˜ํ•˜๊ณ , ๋‚˜๋จธ์ง€๋ฅผ new node $ (L')$์— ๋ฐฐ์น˜
        • kkk๋ฅผ Lโ€ฒL'Lโ€ฒ์˜ least key value๋ผ๊ณ  ํ•จ. $ (k, L')$๋ฅผ split๋˜๋Š” node์˜ parent(๋ถ€๋ชจ)์— insert
      • Parent๊ฐ€ full(๊ฐ€๋“ ์ฐฌ ๊ฒฝ์šฐ), parent๋ฅผ splitํ•˜๊ณ  split์„ ๋” ์œ„๋กœ propagate(์ „ํŒŒ)
      • Splitting of nodes๋Š” full์ด ์•„๋‹Œ node๋ฅผ ์ฐพ์„ ๋•Œ๊นŒ์ง€ ์œ„๋กœ ์ง„ํ–‰
      • ์ตœ์•…์˜ ๊ฒฝ์šฐ root node๊ฐ€ split๋˜์–ด ํŠธ๋ฆฌ์˜ height๊ฐ€~1 increase(์ฆ๊ฐ€)ํ•  ์ˆ˜ ์žˆ์Œ
  • Splitting a non-leaf node: ์ด๋ฏธ full์ธ internal node NNN์— $ (k, L')$๋ฅผ ์‚ฝ์ž…ํ•  ๋•Œ
    • NNN์„ n+1n+1n+1pointers์™€nnnkeys๋ฅผ ์œ„ํ•œ ๊ณต๊ฐ„์ด ์žˆ๋Š” in-memory areaMMM์œผ๋กœ Copy(๋ณต์‚ฌ)
    • $ (k, L')$๋ฅผ MMM์— Insert
    • MMM์—์„œ P1,K1,โ€ฆ,KโŒˆ(n+1)/2โŒ‰โˆ’1,PโŒˆ(n+1)/2โŒ‰P_1, K_1, \dots, K_{\lceil(n+1)/2 \rceil-1}, P_{\lceil(n+1)/2 \rceil}P1โ€‹,K1โ€‹,โ€ฆ,KโŒˆ(n+1)/2โŒ‰โˆ’1โ€‹,PโŒˆ(n+1)/2โŒ‰โ€‹๋ฅผ ๋‹ค์‹œ node NNN์œผ๋กœ Copy
    • MMM์—์„œ PโŒˆ(n+1)/2โŒ‰+1,KโŒˆ(n+1)/2โŒ‰+1,โ€ฆ,Kn,Pn+1P_{\lceil(n+1)/2 \rceil+1}, K_{\lceil(n+1)/2 \rceil+1}, \dots, K_n, P_{n+1}PโŒˆ(n+1)/2โŒ‰+1โ€‹,KโŒˆ(n+1)/2โŒ‰+1โ€‹,โ€ฆ,Knโ€‹,Pn+1โ€‹๋ฅผ ์ƒˆ๋กœ ํ• ๋‹น๋œ node Nโ€ฒN'Nโ€ฒ์œผ๋กœ Copy
    • $ (K_{\lceil(n+1)/2 \rceil}, N')$๋ฅผ NNN์˜ parent์— Insert
    • Note: leaf node๋ฅผ splitํ•˜๋Š” ๊ฒƒ๊ณผ๋Š” ๋‹ฌ๋ฆฌ, search-key๋Š” 'copied'๋˜์ง€ ์•Š๊ณ  parent node๋กœ 'moved'๋จ(i.e., no duplication!)

Updates on B +^++ -Trees: Deletion

  • Record๊ฐ€ file์—์„œ already deleted(์ด๋ฏธ ์‚ญ์ œ)๋˜์—ˆ๋‹ค๊ณ  ๊ฐ€์ •. vvv๋Š” record์˜ search key value์ด๊ณ , PrP_rPrโ€‹์€ record์— ๋Œ€ํ•œ pointer
  • $ (P_r, v)$๋ฅผ leaf node์—์„œ Remove(์ œ๊ฑฐ)
  • If(๋งŒ์•ฝ) leaf node๊ฐ€ ์ œ๊ฑฐ๋กœ ์ธํ•ด too few entries(<โŒˆ(nโˆ’1)/2โŒ‰< \lceil(n-1)/2 \rceil<โŒˆ(nโˆ’1)/2โŒ‰)๋ฅผ ๊ฐ€์ง€๊ณ , node์˜ entries์™€ sibling(ํ˜•์ œ)์˜ entries๊ฐ€ single node(โ‰คnโˆ’1\le n-1โ‰คnโˆ’1)์— fit(๋งž๋Š” ๊ฒฝ์šฐ), merge siblings(ํ˜•์ œ ํ•ฉ์น˜๊ธฐ)
    • ๋‘ node์˜ ๋ชจ๋“  entries๋ฅผ left node์— Insertํ•˜๊ณ  ๋‹ค๋ฅธ node๋ฅผ delete
    • ์‚ญ์ œ๋œ node๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š” pointer๊ฐ€ PiP_iPiโ€‹์ธ ๊ฒฝ์šฐ, ์Œ $ (K_{i-1}, P_i)$๋ฅผ ๊ทธ parent๋กœ๋ถ€ํ„ฐ recursively(์žฌ๊ท€์ ์œผ๋กœ) delete
  • Otherwise(๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ), node๊ฐ€ ์ œ๊ฑฐ๋กœ ์ธํ•ด too few entries๋ฅผ ๊ฐ€์ง€๊ณ , node์˜ entries์™€ sibling์˜ entries๊ฐ€ single node์— fitํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ, redistribute pointers(ํฌ์ธํ„ฐ ์žฌ๋ถ„๋ฐฐ)
    • Node์™€ sibling ์‚ฌ์ด์— pointers๋ฅผ redistributeํ•˜์—ฌ ๋‘˜ ๋‹ค minimum number of entries(โ‰ฅโŒˆ(nโˆ’1)/2โŒ‰\ge \lceil(n-1)/2 \rceilโ‰ฅโŒˆ(nโˆ’1)/2โŒ‰)๋ณด๋‹ค more(๋” ๋งŽ์ด) ๊ฐ€์ง€๋„๋ก ํ•จ
    • Node์˜ parent์—์„œ corresponding search-key value(ํ•ด๋‹น search-key ๊ฐ’)๋ฅผ Update
  • Node deletions๋Š” ์‚ญ์ œ ํ›„ โŒˆn/2โŒ‰\lceil n/2 \rceilโŒˆn/2โŒ‰๊ฐœ ์ด์ƒ์˜ pointers๋ฅผ ๊ฐ€์ง„ node๋ฅผ ์ฐพ์„ ๋•Œ๊นŒ์ง€ cascade upwards(์œ„๋กœ ์ „ํŒŒ)๋  ์ˆ˜ ์žˆ์Œ(Note: internal node๋Š” at least โŒˆn/2โŒ‰\lceil n/2 \rceilโŒˆn/2โŒ‰ pointers๋ฅผ ๊ฐ€์ ธ์•ผ ํ•จ)
  • Root node๊ฐ€ ์‚ญ์ œ ํ›„ only one pointer(ํ•˜๋‚˜์˜ ํฌ์ธํ„ฐ)๋งŒ ๊ฐ€์ง€๋Š” ๊ฒฝ์šฐ, deleted๋˜๊ณ  sole child(์œ ์ผํ•œ ์ž์‹)๊ฐ€ root๊ฐ€ ๋จ

Complexity of B +^++ -Tree Updates

  • Insertion ๋ฐ deletion of a single entry์˜ Cost(I/O ์ž‘์—… ์ˆ˜ ์ธก๋ฉด์—์„œ)๋Š” ํŠธ๋ฆฌ์˜ height์— proportional(๋น„๋ก€)
  • KKK๊ฐœ์˜ entries์™€ ์ตœ๋Œ€ fanout(๋ถ„๊ธฐ์ˆ˜) nnn์ด ์žˆ๋Š” ๊ฒฝ์šฐ, entry์˜ insert/delete์˜ worst-case complexity๋Š” O(logโกโŒˆn/2โŒ‰(K))O(\log_{\lceil n/2 \rceil}(K))O(logโŒˆn/2โŒ‰โ€‹(K))
  • In practice(์‹ค์ œ๋กœ๋Š”), I/O ์ž‘์—… ์ˆ˜๋Š” less(๋” ์ ์Œ)
    • Internal nodes๋Š” buffer(๋ฒ„ํผ)์— ์žˆ๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Œ
    • Splits/merges๋Š” rare(๋“œ๋ฌผ๊ณ ), ๋Œ€๋ถ€๋ถ„์˜ insert/delete ์ž‘์—…์€ only affect a leaf node(leaf node๋งŒ ์˜ํ–ฅ)
  • Average node occupancy(ํ‰๊ท  ๋…ธ๋“œ ์ ์œ ์œจ)๋Š” insertion order์— depends on(๋‹ฌ๋ ค ์žˆ์Œ)
    • Random insertion(๋ฌด์ž‘์œ„ ์‚ฝ์ž…) ์‹œ โ‰ฅ2/3\ge 2/3โ‰ฅ2/3
    • Sorted order(์ •๋ ฌ๋œ ์ˆœ์„œ)๋กœ ์‚ฝ์ž… ์‹œ 1/21/21/2

B +^++ -Tree File Organization

  • B +^++ -Tree 'File' Organization
    • B +^++ -tree file organization์˜ Leaf nodes๋Š” pointers ๋Œ€์‹  records(๋ ˆ์ฝ”๋“œ)๋ฅผ ์ €์žฅ
    • Insertion/deletion/updates๊ฐ€ ์žˆ์„ ๋•Œ๋„ data records๋ฅผ clustered(ํด๋Ÿฌ์Šคํ„ฐ๋ง)๋œ ์ƒํƒœ๋กœ ์œ ์ง€ํ•˜๋Š” ๋ฐ ๋„์›€ โ†’\toโ†’ data file degradation problem(overflow blocks) ํ•ด๊ฒฐ
    • Leaf nodes๋Š” ์—ฌ์ „ํžˆ half full(์ ˆ๋ฐ˜) ์ƒํƒœ๊ฐ€ ์š”๊ตฌ๋จ
    • Records๊ฐ€ pointers๋ณด๋‹ค ํฌ๊ธฐ ๋•Œ๋ฌธ์—, leaf node์— ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” records์˜ maximum number๋Š” non-leaf node์˜ pointers ์ˆ˜๋ณด๋‹ค less(์ ์Œ)
    • Good space utilization(์ข‹์€ ๊ณต๊ฐ„ ํ™œ์šฉ)์ด ์ค‘์š”
      • Splits ๋ฐ merges ๋™์•ˆ redistribution(์žฌ๋ถ„๋ฐฐ)์— more sibling nodes(๋” ๋งŽ์€ ํ˜•์ œ ๋…ธ๋“œ)๋ฅผ ํฌํ•จ์‹œํ‚ด
      • Redistribution์— 2 siblings๋ฅผ ํฌํ•จ์‹œํ‚ค๋ฉด(split/merge๋ฅผ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด), ๊ฐ node๋Š” at least โŒŠ3n/4โŒ‹\lfloor 3n/4 \rfloorโŒŠ3n/4โŒ‹ entries๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋จ
    • Insertion ๋ฐ deletion์€ B +^++ -tree index์—์„œ entries์˜ insertion ๋ฐ deletion๊ณผ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ์ฒ˜๋ฆฌ

B-Tree Index Files

  • B +^++ -tree์™€ ์œ ์‚ฌํ•˜์ง€๋งŒ, B-tree๋Š” search-key values๊ฐ€ only once(ํ•œ ๋ฒˆ๋งŒ) ๋‚˜ํƒ€๋‚˜๋„๋ก ํ—ˆ์šฉ. Redundant storage of search keys(search key์˜ ์ค‘๋ณต ์ €์žฅ)๋ฅผ ์ œ๊ฑฐ

  • Non-leaf nodes์˜ search keys๋Š” B-tree์˜ ์–ด๋””์—๋„ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์Œ. Non-leaf node์˜ ๊ฐ search key์— ๋Œ€ํ•ด additional pointer field(์ถ”๊ฐ€ ํฌ์ธํ„ฐ ํ•„๋“œ)๊ฐ€ ํฌํ•จ๋˜์–ด์•ผ ํ•จ

  • Generalized B-tree leaf node

  • Non-leaf node: pointers BiB_iBiโ€‹๋Š” bucket ๋˜๋Š” file record pointers

  • Advantages of B-Tree indices

    • Corresponding B +^++ -Tree๋ณด๋‹ค May use less tree nodes(๋” ์ ์€ ์ˆ˜์˜ ํŠธ๋ฆฌ ๋…ธ๋“œ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ)
    • ๋•Œ๋•Œ๋กœ leaf node์— ๋„๋‹ฌํ•˜๊ธฐ ์ „์— search-key value๋ฅผ find(์ฐพ๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅ)
  • Disadvantages of B-Tree indices

    • Only small fraction(์ž‘์€ ๋ถ€๋ถ„)์˜ search-key values๋งŒ ์ผ์ฐ ๋ฐœ๊ฒฌ๋จ
    • Non-leaf nodes๋Š” larger(๋” ํฌ๋ฏ€๋กœ), fan-out์ด reduced(๊ฐ์†Œ) โ†’\toโ†’ B-Trees๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ corresponding B +^++ -Tree๋ณด๋‹ค greater depth(๋” ํฐ ๊นŠ์ด)๋ฅผ ๊ฐ€์ง
    • Insertion ๋ฐ deletion์ด B +^++ -Trees๋ณด๋‹ค more complicated(๋” ๋ณต์žก)
    • Implementation(๊ตฌํ˜„)์ด B +^++ -Trees๋ณด๋‹ค harder(๋” ์–ด๋ ค์›€)
    • Typically, B-Trees์˜ ์žฅ์ ์€ ๋‹จ์ ๋ณด๋‹ค out weigh(ํฌ์ง€ ์•Š์Œ)

Other Issues in Indexing

  • Record relocation(๋ ˆ์ฝ”๋“œ ์žฌ๋ฐฐ์น˜) and secondary indices
    • Record๊ฐ€ moves(์ด๋™)ํ•˜๋ฉด, record pointers๋ฅผ ์ €์žฅํ•˜๋Š” all secondary indices(๋ชจ๋“  ๋ณด์กฐ ์ธ๋ฑ์Šค)๊ฐ€ have to be updated(๊ฐฑ์‹ ๋˜์–ด์•ผ ํ•จ)
    • โ†’\toโ†’ B +^++ -tree file organizations์—์„œ leaf node splits๊ฐ€ very expensive(๋งค์šฐ ๋น„์‹ธ์ง)
    • Solution: secondary index์—์„œ record pointer ๋Œ€์‹  B +^++ -tree file organization์˜ search key๋ฅผ ์‚ฌ์šฉ
      • Record๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด file organization์˜ Extra traversal(์ถ”๊ฐ€ ์ˆœํšŒ)
      • Queries์— ๋Œ€ํ•œ Higher cost(๋” ๋†’์€ ๋น„์šฉ), but node splits are cheap(๋…ธ๋“œ ๋ถ„ํ• ์€ ์ €๋ ดํ•จ)
  • Indexing strings(๋ฌธ์ž์—ด ์ธ๋ฑ์‹ฑ)
    • Variable length strings(๊ฐ€๋ณ€ ๊ธธ์ด ๋ฌธ์ž์—ด) as keys
      • Variable fanout(๊ฐ€๋ณ€ ๋ถ„๊ธฐ์ˆ˜)
      • Pointers์˜ ์ˆ˜๊ฐ€ ์•„๋‹Œ space utilization(๊ณต๊ฐ„ ํ™œ์šฉ)์„ splitting(๋ถ„ํ• )์˜ criterion(๊ธฐ์ค€)์œผ๋กœ ์‚ฌ์šฉ
      • Prefix compression(์ ‘๋‘์‚ฌ ์••์ถ•)
        • Internal nodes์˜ key values๋Š” prefixes(์ ‘๋‘์‚ฌ) of full key์ผ ์ˆ˜ ์žˆ์Œ โ†’\toโ†’ fanout increases(์ฆ๊ฐ€)
        • Key value๋กœ ๋ถ„๋ฆฌ๋œ subtrees์˜ entries๋ฅผ distinguish(๊ตฌ๋ณ„)ํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ๋ฌธ์ž๋ฅผ ์œ ์ง€
        • E.g., "Silas"์™€ "Silberschatz"๋Š” "Silb"๋กœ ๋ถ„๋ฆฌ๋  ์ˆ˜ ์žˆ์Œ

Other Issues in Indexing: Bulk Loading & Bottom-Up Build

  • Entries๋ฅผ one-at-a-time(ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์”ฉ) B +^++ -tree์— ์‚ฝ์ž…ํ•˜๋ฉด entry๋‹น โ‰ฅย 1ย I/O\ge~1 \text{ I/O}โ‰ฅย 1ย I/O๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Œ
    • ์ตœ์•…์˜ ๊ฒฝ์šฐ, ํŠธ๋ฆฌ์˜ height์— ๋น„๋ก€
    • A large number of entries๋ฅผ ํ•œ ๋ฒˆ์— ์‚ฝ์ž…ํ•˜๋Š” ๊ฒฝ์šฐ(bulk loading(๋Œ€๋Ÿ‰ ๋กœ๋”ฉ)) very inefficient(๋งค์šฐ ๋น„ํšจ์œจ์ )
    • B +^++ -tree index๊ฐ€ large relation(ํฐ ๊ด€๊ณ„)์— ๊ตฌ์ถ•๋  ๋•Œ bulk loading์ด ํ•„์š”
  • Efficient alternative~1(ํšจ์œจ์ ์ธ ๋Œ€์•ˆ~1)
    • Efficient external sorting algorithms(ํšจ์œจ์ ์ธ ์™ธ๋ถ€ ์ •๋ ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜)์„ ์‚ฌ์šฉํ•˜์—ฌ index entries๋ฅผ Sort(์ •๋ ฌ)
    • Insert in sorted order(์ •๋ ฌ๋œ ์ˆœ์„œ๋กœ ์‚ฝ์ž…)
    • ํŠน์ • leaf node๋กœ ๊ฐ€๋Š” ๋ชจ๋“  entries๋Š” consecutively(์—ฐ์†์ ์œผ๋กœ) ๋‚˜ํƒ€๋‚จ โ†’\toโ†’ leaf node๋Š” only once(ํ•œ ๋ฒˆ๋งŒ) written out(๊ธฐ๋ก)ํ•˜๋ฉด ๋จ
    • Much improved IO performance(ํ›จ์”ฌ ํ–ฅ์ƒ๋œ IO ์„ฑ๋Šฅ), but most leaf nodes half full
  • Efficient alternative 2(ํšจ์œจ์ ์ธ ๋Œ€์•ˆ 2): Bottom-up B +^++ -tree construction(์ƒํ–ฅ์‹ B +^++ -tree ๊ตฌ์ถ•)
    • ์ด์ „๊ณผ ๊ฐ™์ด entries๋ฅผ sort
    • ๊ทธ๋ฆฌ๊ณ  leaf level๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์—ฌ layer-by-layer(๊ณ„์ธต๋ณ„๋กœ) tree๋ฅผ create(์ƒ์„ฑ)
    • ์ •๋ ฌ๋œ entries๋ฅผ block์— fitํ•  ์ˆ˜ ์žˆ๋Š” ๋งŒํผ ๋งŽ์€ entries๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ blocks๋กœ Break up(๋‚˜๋ˆ”) โ†’\toโ†’ resulting blocks(๊ฒฐ๊ณผ ๋ธ”๋ก)์ด leaf level์„ ํ˜•์„ฑ
    • ๊ฐ block์˜ minimum value์™€ block์— ๋Œ€ํ•œ pointer๋Š” next level์˜ entries๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ
    • Most database systems(๋Œ€๋ถ€๋ถ„์˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‹œ์Šคํ…œ)์—์„œ bulk-load utility(๋Œ€๋Ÿ‰ ๋กœ๋“œ ์œ ํ‹ธ๋ฆฌํ‹ฐ)์˜ ์ผ๋ถ€๋กœ ๊ตฌํ˜„

Hash Indices

  • Bucket: โ‰ฅย 1\ge~1โ‰ฅย 1 index entries๋ฅผ ํฌํ•จํ•˜๋Š” storage unit(์ €์žฅ ๋‹จ์œ„)(์ผ๋ฐ˜์ ์œผ๋กœ disk block)
  • Entry์˜ search-key value์—์„œ hash function์„ ์‚ฌ์šฉํ•˜์—ฌ entry์˜ bucket์„ Obtain(์–ป์Œ)
  • Hash function hhh: ๋ชจ๋“  search-key values ์ง‘ํ•ฉ KKK์—์„œ ๋ชจ๋“  bucket addresses ์ง‘ํ•ฉ BBB๋กœ์˜ ํ•จ์ˆ˜
  • Hash function์€ access, insertion, deletion์„ ์œ„ํ•œ entries๋ฅผ locate(์ฐพ๋Š” ๋ฐ) ์‚ฌ์šฉ
  • Different search-key values๋ฅผ ๊ฐ€์ง„ entries๋Š” same bucket(๊ฐ™์€ ๋ฒ„ํ‚ท)์— ๋งคํ•‘๋  ์ˆ˜ ์žˆ์Œ. ๋”ฐ๋ผ์„œ entry๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด entire bucket(์ „์ฒด ๋ฒ„ํ‚ท)์„ sequentially(์ˆœ์ฐจ์ ์œผ๋กœ) searched(๊ฒ€์ƒ‰)ํ•ด์•ผ ํ•จ

Static Hashing

  • In a hash index(ํ•ด์‹œ ์ธ๋ฑ์Šค), buckets์€ records์— ๋Œ€ํ•œ pointers๋ฅผ ๊ฐ€์ง„ entries๋ฅผ ์ €์žฅ(i.e., buckets store index entries)
  • In a hash file-organization(ํ•ด์‹œ ํŒŒ์ผ ๊ตฌ์„ฑ), buckets์€ records๋ฅผ ์ €์žฅ
  • Bucket overflow(๋ฒ„ํ‚ท ์˜ค๋ฒ„ํ”Œ๋กœ)๋Š” ๋‹ค์Œ์œผ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Œ
    • Insufficient buckets(๋ถˆ์ถฉ๋ถ„ํ•œ ๋ฒ„ํ‚ท)
    • Records์˜ distribution(๋ถ„ํฌ)์—์„œ์˜ Skew(์™œ๊ณก). ๋‘ ๊ฐ€์ง€ ์ด์œ ๋กœ ๋ฐœ์ƒ ๊ฐ€๋Šฅ
      • Multiple records(์—ฌ๋Ÿฌ ๋ ˆ์ฝ”๋“œ)๊ฐ€ same search-key value(๋™์ผํ•œ search-key ๊ฐ’)๋ฅผ ๊ฐ€์ง
      • Chosen hash function(์„ ํƒ๋œ ํ•ด์‹œ ํ•จ์ˆ˜)์ด key values์˜ non-uniform distribution(๋น„๊ท ์ผ ๋ถ„ํฌ)๋ฅผ ์ƒ์„ฑ
  • Bucket overflow์˜ probability(ํ™•๋ฅ )๋Š” ์ค„์ผ ์ˆ˜ ์žˆ์ง€๋งŒ, eliminated(์ œ๊ฑฐ)ํ•  ์ˆ˜ ์—†์Œ. overflow buckets(์˜ค๋ฒ„ํ”Œ๋กœ ๋ฒ„ํ‚ท)์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฒ˜๋ฆฌ

Hash Functions

  • Hash functions์€ uniform(๊ท ์ผ) and random(๋ฌด์ž‘์œ„)์ด ์š”๊ตฌ๋จ
  • Uniform: ๊ฐ bucket์— all theoretically possible values(๋ชจ๋“  ์ด๋ก ์ ์œผ๋กœ ๊ฐ€๋Šฅํ•œ ๊ฐ’) ์ง‘ํ•ฉ์—์„œ same number์˜ search-key values๊ฐ€ ํ• ๋‹น๋จ
  • Random: ๊ฐ bucket์€ ํŒŒ์ผ ๋‚ด search-key values์˜ actual distribution(์‹ค์ œ ๋ถ„ํฌ)์— irrespective(๊ด€๊ณ„์—†์ด), same number์˜ entries๊ฐ€ ํ• ๋‹น๋จ
  • Typical hash functions(์ผ๋ฐ˜์ ์ธ ํ•ด์‹œ ํ•จ์ˆ˜)๋Š” search-key์˜ internal binary representation(๋‚ด๋ถ€ ์ด์ง„ ํ‘œํ˜„)์— ๋Œ€ํ•œ computation(๊ณ„์‚ฐ)์„ ์ˆ˜ํ–‰
    • E.g., string search-key์˜ ๊ฒฝ์šฐ, string์˜ ๋ชจ๋“  characters(๋ฌธ์ž)์˜ binary representations๋ฅผ addedํ•˜๊ณ , sum modulo the number of buckets(ํ•ฉ๊ณ„๋ฅผ ๋ฒ„ํ‚ท ์ˆ˜๋กœ ๋‚˜๋ˆˆ ๋‚˜๋จธ์ง€)๋ฅผ returned(๋ฐ˜ํ™˜)

Handling of Bucket Overflows

  • Overflow chaining(์˜ค๋ฒ„ํ”Œ๋กœ ์ฒด์ธ): ์ฃผ์–ด์ง„ bucket์˜ overflow buckets์€ linked list(์—ฐ๊ฒฐ ๋ฆฌ์ŠคํŠธ)๋กœ chained together(ํ•จ๊ป˜ ์—ฐ๊ฒฐ)
  • Overflow chaining์„ ์‚ฌ์šฉํ•˜๋Š” Hash indexing: closed addressing(ํ์‡„ ์ฃผ์†Œ ์ง€์ •)(or closed hashing(ํ์‡„ ํ•ด์‹ฑ))
  • Overflow buckets์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” alternative(๋Œ€์•ˆ)์ธ open addressing(๊ฐœ๋ฐฉ ์ฃผ์†Œ ์ง€์ •)(or open hashing(๊ฐœ๋ฐฉ ํ•ด์‹ฑ))์€ database applications์— not suitable(์ ํ•ฉํ•˜์ง€ ์•Š์Œ)

Deficiencies of Static Hashing

  • Static hashing์—์„œ, ํ•จ์ˆ˜ hhh๋Š” search-key values๋ฅผ fixed set(BBB)์˜ bucket addresses๋กœ ๋งคํ•‘. Databases๋Š” ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ grow or shrink(์ปค์ง€๊ฑฐ๋‚˜ ์ค„์–ด๋“ฆ)
    • Initial number of buckets๊ฐ€ too small์ด๊ณ , ํŒŒ์ผ์ด grows(์ปค์ง€๋ฉด), too much overflows(๋„ˆ๋ฌด ๋งŽ์€ ์˜ค๋ฒ„ํ”Œ๋กœ)๋กœ ์ธํ•ด ์„ฑ๋Šฅ์ด degrade(์ €ํ•˜)๋จ
    • Anticipated growth(์˜ˆ์ƒ๋˜๋Š” ์„ฑ์žฅ)์„ ์œ„ํ•ด space๊ฐ€ ํ• ๋‹น๋˜๋ฉด, initially(์ดˆ๊ธฐ์—) ์ƒ๋‹นํ•œ ์–‘์˜ space๊ฐ€ wasted(๋‚ญ๋น„)๋จ(and buckets will be underfull(๋œ ์ฑ„์›Œ์ง))
    • Database๊ฐ€ shrinks(์ค„์–ด๋“ค๋ฉด), again space๊ฐ€ wasted๋จ
  • One solution(ํ•˜๋‚˜์˜ ํ•ด๊ฒฐ์ฑ…)
    • New hash function์œผ๋กœ ํŒŒ์ผ์˜ Periodic re-organization(์ฃผ๊ธฐ์ ์ธ ์žฌ๊ตฌ์„ฑ)
    • Expensive(๋น„์šฉ์ด ๋งŽ์ด ๋“ค๊ณ ), disrupts normal operations(์ •์ƒ์ ์ธ ์ž‘์—…์„ ๋ฐฉํ•ด)
  • Better solution(๋” ๋‚˜์€ ํ•ด๊ฒฐ์ฑ…): Dynamic Hashing(๋™์  ํ•ด์‹ฑ)
    • Buckets์˜ number๋ฅผ dynamically(๋™์ ์œผ๋กœ) modify(์ˆ˜์ •)ํ•  ์ˆ˜ ์žˆ๋Š” Techniques(๊ธฐ์ˆ )
    • Size๊ฐ€ grows and shrinksํ•˜๋Š” database์— Good
    • Hash function์„ dynamically modifyํ•  ์ˆ˜ ์žˆ๋„๋ก Allows(ํ—ˆ์šฉ)
    • Extendable hashing(ํ™•์žฅ ๊ฐ€๋Šฅ ํ•ด์‹ฑ): ๋™์  ํ•ด์‹ฑ์˜ ํ•œ ํ˜•ํƒœ(๊ต์žฌ Section 24.5.2 ์ฐธ์กฐ)

Comparison of Ordered Indexing and Hashing

  • Periodic re-organization์˜ Cost
  • Insertions and deletions์˜ Relative frequency(์ƒ๋Œ€ ๋นˆ๋„)
  • Is it desirable to optimize average access time at the expense of worst-case access time?(์ตœ์•…์˜ ๊ฒฝ์šฐ ์ ‘๊ทผ ์‹œ๊ฐ„์„ ํฌ์ƒํ•˜์—ฌ ํ‰๊ท  ์ ‘๊ทผ ์‹œ๊ฐ„์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋ฐ”๋žŒ์งํ•œ๊ฐ€?)
  • Expected type of queries(์˜ˆ์ƒ๋˜๋Š” ์ฟผ๋ฆฌ ์œ ํ˜•)
    • Hashing์€ ์ผ๋ฐ˜์ ์œผ๋กœ key์˜ specified value๋ฅผ ๊ฐ€์ง„ records๋ฅผ retrieving(๊ฒ€์ƒ‰)ํ•˜๋Š” ๋ฐ better(๋” ์ข‹์Œ)
    • Range queries๊ฐ€ common(ํ”ํ•œ ๊ฒฝ์šฐ), ordered indices๊ฐ€ preferred(์„ ํ˜ธ๋จ)
  • In practice(์‹ค์ œ๋กœ๋Š”)
    • PostgreSQL์€ hash indices๋ฅผ ์ง€์›ํ•˜์ง€๋งŒ, poor performance(์ €์กฐํ•œ ์„ฑ๋Šฅ)๋กœ ์ธํ•ด discourages use(์‚ฌ์šฉ์„ ๊ถŒ์žฅํ•˜์ง€ ์•Š์Œ)
    • Oracle์€ static hash organization์„ ์ง€์›ํ•˜์ง€๋งŒ, not hash indices(ํ•ด์‹œ ์ธ๋ฑ์Šค๋Š” ์•„๋‹˜)
    • SQLServer๋Š” only B +^++ -trees๋ฅผ ์ง€์›
์ตœ๊ทผ ์ˆ˜์ •: 25. 11. 6. ์˜คํ›„ 12:07
Contributors: kmbzn
Prev
13. Data Storage Structures
Next
15. Query Processing

BUILT WITH

CloudflareNode.jsGitHubGitVue.jsJavaScriptVSCodenpm

All trademarks and logos are property of their respective owners.
ยฉ 2025 kmbzn ยท MIT License