Log-Structured Merge-Tree (LSM Tree)

What is an LSM Tree?

The Log-Structured Merge-Tree (LSM Tree) is a data structure optimized for systems with high write volumes (like many NoSQL databases, search engines, and data warehouses). Unlike traditional B-Trees which often perform in-place updates (requiring potentially slow random disk I/O), LSM Trees buffer writes in memory and write data to disk sequentially in immutable batches.

Core Idea: Trading Read Performance for Write Performance

LSM Trees make writes very fast by absorbing them into an in-memory structure (Memtable) and appending to a log (Write-Ahead Log - WAL, not explicitly visualized) for durability. Reads, however, might need to check multiple places (memory and several disk files) to find the latest value.

Components

Operations

This read process potentially checking multiple places is called read amplification. Compaction helps reduce this by merging data and pushing it to higher levels.

Visualize the LSM Tree

Configure limits, add/delete/read keys, and trigger flush/compaction to see the data flow.

Memtable (In-Memory, Sorted)

Immutable Memtable (Flushing...)

(Empty)
Status: Idle

Disk Storage (SSTable Levels)

Log messages will appear here...