Background

In 3 - My Interview Cookbook I mentioned a handful of “low level design” questions. To be fair, the type of low level design questions I’m referring to is a bit different to the “parking lot” design and others on hello interview’s LLD section or questions that you’ll find online.

It’s worth reading the LLD section I wrote in 3 - My Interview Cookbook because that will give the background to how these interviews work, in my experience. This page is dedicated as a set of questions you can practice. At some point in the future, I might launch something like zyros.dev/learn (might not be a real URL, sorry) where I host these on an interactive editor in a more friendly way. That being said, these interviews are heavily in the direction of being focused on your ability to communicate.

A number of people I’ve met don’t think these are worth practicing, and it’s just ngmi if you can’t make it. I don’t really believe in that. Like anything, it’s a skill you can build.

Disclaimer Basically all of these questions are generated by an LLM. Yep. Sorry. I can explain how I did this though. I have some other set of questions I have in mind (vague ideas, like ratelimiters) then described in detail what I want. I then had it generate 80 questions, and write 1-3 solutions and alternative solutions for each question. Finally, I went through and hand-picked these questions out of that bank as ones that I liked.

So, while it is heavily LLM-driven, I also human-reviewed it enough so it isn’t slop. It’s quite hard to find high quality sets of questions for this. Maybe at some point someone will consolidate it into patterns or something, but I dunno.

Questions

Question 1 - Skip List

  • Difficulty: 9 / 10
  • Approximate lines of code: 100 LoC
  • Tags: data-structures

Description

A skip list is a probabilistic data structure that provides O(log n) expected time for search, insert, and delete operations, similar to a balanced BST, but using randomization instead of complex rebalancing logic. It consists of multiple levels of sorted linked lists stacked on top of each other. The bottom level (level 0) contains all elements. Each higher level contains a random subset of the elements below it, roughly half as many.

The key insight is the search pattern: start at the highest level and move right until you would overshoot, then drop down a level and repeat. This “express lane” approach lets you skip over many elements. The randomized level assignment (geometric distribution with p=0.5) ensures that on average, level k has half as many nodes as level k-1, giving O(log n) expected search time.

Part B: Insert

Part C: Delete

Interview comments

Code solutions

Question 2 - Request Coalescer

  • Difficulty: 6 / 10
  • Approximate lines of code: 80 LoC
  • Tags: concurrency

Description

A request coalescer deduplicates concurrent requests for the same resource. When multiple callers request the same key simultaneously, only one actual fetch occurs - all other callers wait for and share the result. This pattern is common in caching layers, API gateways, and database connection pools to prevent thundering herd problems where many identical requests overwhelm a backend service.

The core data structures are: (1) an in-flight map tracking keys currently being fetched, mapping each key to a Future/Event that waiters can await, and (2) a cache storing results with TTL. The critical insight is lock scope: you hold the lock only to check/update the in-flight map, never during the actual fetch.

Part A: Request Coalescing

Part B: TTL Cache

Part C: Error Propagation

Interview comments

Code solutions

Question 3 - Lock-Free Queue

  • Difficulty: 10 / 10
  • Approximate lines of code: 80 LoC
  • Tags: concurrency, data-structures

Description

A lock-free queue allows multiple threads to enqueue and dequeue concurrently without using traditional locks (mutexes). Instead of blocking threads, it uses atomic compare-and-swap (CAS) operations: “if the value is still X, change it to Y; otherwise retry.” This enables progress guarantees - at least one thread makes progress even if others stall.

The classic implementation is the Michael-Scott queue: a linked list with head and tail pointers. A sentinel (dummy) node simplifies empty-queue handling. The key insight is that CAS operations can fail when another thread modifies the data, so operations run in retry loops. The algorithm also requires “helping” - if one thread sees the tail pointer lagging behind, it advances the tail before proceeding.

Part A: Basic Queue

Part B: The ABA Problem

Part C: Memory Reclamation

Interview comments

Code solutions

Question 4 - Count-Min Sketch

  • Difficulty: 8 / 10
  • Approximate lines of code: 70 LoC
  • Tags: probabilistic, data-structures

Description

A Count-Min Sketch is a probabilistic data structure for estimating the frequency of items in a data stream using sublinear space. Unlike a hash map that stores exact counts, a CMS uses a 2D array of counters with multiple hash functions. Each item hashes to one position per row, and all those positions get incremented. To query, you take the minimum across all rows - this gives an estimate that may overcount (due to hash collisions) but never undercounts.

The structure uses O(w * d) space where w is width and d is depth (number of hash functions). With proper sizing, you get: with probability >= 1 - delta, the estimate is at most true_count + epsilon * N, where N is total items seen. The formulas are w = ceil(e/epsilon) and d = ceil(ln(1/delta)).

Part A: Basic Structure

Part B: Optimal Parameters

Part C: Merge and Analysis

Interview comments

Code solutions

Question 5 - Double-Entry Ledger

  • Difficulty: 4 / 10
  • Approximate lines of code: 90 LoC
  • Tags: storage

Description

Double-entry bookkeeping is the foundation of modern accounting. Every transaction affects at least two accounts: one is debited, one is credited, and the total debits must equal total credits. This invariant makes errors detectable - if your books don’t balance, something is wrong. Assets and expenses have “debit normal” balances (debits increase them), while liabilities, equity, and revenue have “credit normal” balances (credits increase them).

The core data structures are: accounts (with names and types), transactions (with descriptions and multiple entries), and a ledger that validates and stores everything. A trial balance report sums all account balances to verify the books still balance. Use Decimal for money - floating point arithmetic will eventually produce rounding errors that break the balance invariant.

Part A: Accounts and Transactions

Part B: Trial Balance

Part C: Multi-Leg Transactions

Interview comments

Code solutions

Question 6 - Task DAG Executor

  • Difficulty: 9 / 10
  • Approximate lines of code: 70 LoC
  • Tags: scheduling, concurrency

Description

A Task DAG (Directed Acyclic Graph) Executor is the core of build systems like Make, Bazel, and Gradle. Tasks have dependencies: Task B depends on Task A means A must complete before B can start. The executor must find a valid execution order (topological sort) or detect if the dependencies form a cycle (impossible to execute). This is fundamentally a topological sorting problem.

Two main approaches exist: Kahn’s algorithm (BFS-based) counts incoming edges and processes nodes with zero in-degree; DFS-based post-order traversal visits dependencies recursively and appends nodes after all dependencies are visited. Both run in O(V + E) time. Internal state typically includes a task map (name task), an adjacency list (task list of dependents), and in-degree counts or visit state.

Part A: Basic Execution Order

Part B: Parallel Batches

Part C: Actual Parallel Execution

Interview comments

Code solutions

Question 7 - Quadtree

  • Difficulty: 7 / 10
  • Approximate lines of code: 90 LoC
  • Tags: data-structures, game/simulation

Description

A quadtree is a tree data structure where each internal node has exactly four children, used to partition a 2D space by recursively subdividing it into four quadrants. It is commonly used in collision detection (games), spatial indexing (GIS), and image compression. Each node represents a rectangular region; when a node contains too many objects, it splits into four children (NW, NE, SW, SE). The key insight is that spatial queries can skip entire subtrees whose bounding boxes do not intersect the query region.

Internal state example after inserting points (10,10), (20,20), (80,80):

Root [boundary: (0,0)-(100,100), capacity: 4]
  points: [(10,10), (20,20), (80,80)]  # Not yet subdivided

After inserting 2 more points, capacity exceeded:
Root [divided=True]
  NW [boundary: (0,50)-(50,100)] -> points: []
  NE [boundary: (50,50)-(100,100)] -> points: [(80,80)]
  SW [boundary: (0,0)-(50,50)] -> points: [(10,10), (20,20)]
  SE [boundary: (50,0)-(100,50)] -> points: []

Part A: Point Storage and Region Queries

Part B: Rectangle Storage and Collision Detection

Part C: Deletion and Rebalancing

Interview comments

Code solutions

Question 8 - Two-Phase Commit

  • Difficulty: 10 / 10
  • Approximate lines of code: 90 LoC
  • Tags: distributed-systems, concurrency

Description

Two-Phase Commit (2PC) is a protocol for coordinating distributed transactions across multiple nodes (databases, services, etc.). When a transaction spans multiple systems, you need all of them to either commit or abort together - partial commits corrupt data. The coordinator orchestrates the protocol, and participants vote on whether they can commit.

The two phases are: (1) Prepare: coordinator asks all participants “can you commit?”, participants lock resources and vote YES or NO; (2) Commit/Abort: if all voted YES, coordinator tells everyone to COMMIT, otherwise tells everyone to ABORT. The key insight: once a participant votes YES, it must be able to commit later (resources stay locked until the coordinator’s decision arrives).

Part A: Happy Path

Part B: Participant Failures

Part C: Write-Ahead Logging

Interview comments

Code solutions

Question 9 - Connection Pool

  • Difficulty: 8 / 10
  • Approximate lines of code: 100 LoC
  • Tags: concurrency, storage

Description

A connection pool manages reusable database connections to avoid the overhead of creating new connections for every request. The pool maintains a fixed maximum size, hands out connections on checkout, and accepts them back on checkin. Key challenges include thread-safety (multiple threads requesting connections concurrently), validation (detecting broken connections), and timeouts (what happens when the pool is exhausted).

Internally, you need: a collection of available connections, a set tracking checked-out connections, a lock for thread-safety, and a condition variable for blocking when the pool is exhausted. Connections should be validated before being returned to ensure they’re still usable.

Part A: Basic Pool

Part B: Validation

Part C: Timeout

Interview comments

Code solutions

Question 10 - Trie Autocomplete

  • Difficulty: 3 / 10
  • Approximate lines of code: 80 LoC
  • Tags: data-structures

Description

A trie (prefix tree) is a tree data structure where each node represents a character, and paths from root to nodes spell out prefixes. Each node contains a dictionary mapping characters to child nodes, plus metadata like is_end (marks complete words) and optional ranking data (frequency, recency). Tries enable O(k) prefix lookup where k is the prefix length, making them ideal for autocomplete systems.

The core operations are: (1) insert by walking/creating nodes for each character, (2) search by walking nodes and checking is_end, and (3) autocomplete by finding the prefix node then collecting all words in that subtree via DFS.

Part A: Basic Trie Operations

Part B: Prefix Autocomplete

Part C: Ranking by Frequency

Interview comments

Code solutions

Question 11 - Chess Validator

  • Difficulty: 5 / 10
  • Approximate lines of code: 90 LoC
  • Tags: game/simulation

Description

A chess move validator determines whether a proposed move is legal according to the rules of chess. This involves checking piece-specific movement patterns, path blocking for sliding pieces (rook, bishop, queen), and special moves like castling and en passant. The validator does not need to track game state across moves - it validates a single move given the current board position.

The board is represented as an 8x8 grid (row 0-7, col 0-7) with pieces identified by color and type. Key data structures: a dictionary mapping positions to pieces, and optionally castling rights and en passant target square. The core logic involves computing delta (dr, dc) between start and end positions and validating against piece-specific rules.

Part A: Basic Piece Movement

Part B: Path Blocking and Captures

Part C: Special Moves

Interview comments

Code solutions

Question 12 - Consistent Hash

  • Difficulty: 10 / 10
  • Approximate lines of code: 80 LoC
  • Tags: distributed-systems, data-structures

Description

Consistent hashing is the algorithm used by distributed systems like Cassandra, DynamoDB, and memcached to distribute data across nodes with minimal remapping when nodes join or leave. The key insight: when a node is added or removed, only K/n keys need to move (where K is total keys and n is number of nodes), compared to traditional hashing where nearly all keys would remap.

The core data structure is a “hash ring” - imagine a circle from 0 to 2^32. Nodes are placed at positions determined by hashing their IDs. To find which node owns a key, hash the key and walk clockwise until you hit a node. Internal state is a sorted list of (hash_position, node_id) pairs plus a dict mapping hash positions to node names for O(1) lookup after binary search.

Note: The sortedcontainers library (SortedList, SortedDict) is available and can simplify this problem. Using SortedList instead of bisect.insort + list gives O(log n) insertions/deletions instead of O(n).

Part A: Basic Ring

Part B: Virtual Nodes

Part C: Efficient Lookup

Interview comments

Code solutions

Question 13 - DNS Cache

  • Difficulty: 4 / 10
  • Approximate lines of code: 70 LoC
  • Tags: storage

Description

A DNS cache stores domain-to-IP mappings to avoid repeated network lookups. Each record has a TTL (time-to-live) after which it expires. The cache must handle three things: TTL-based expiration, capacity limits (evicting old entries when full), and CNAME chains (aliases that point to other domains which may themselves be aliases).

Internal state is typically a dict mapping domain names to cache entries. Each entry contains the IP address (or CNAME target), the expiration timestamp, and optionally the record type. For LRU eviction, use an OrderedDict where accessing an entry moves it to the end, and eviction removes from the front.

Part A: TTL-Based Caching

Part B: LRU Eviction

Part C: CNAME Chain Resolution

Interview comments

Code solutions

Question 14 - Pub/Sub Message Broker

  • Difficulty: 7 / 10
  • Approximate lines of code: 90 LoC
  • Tags: distributed-systems

Description

A publish-subscribe message broker decouples message producers from consumers. Publishers send messages to topics without knowing who will receive them; subscribers register interest in topics and receive matching messages. This pattern is fundamental to event-driven architectures, microservices communication, and real-time systems.

There are two main delivery models: push (broker calls subscriber callbacks immediately) and pull (subscribers request messages on demand). Each message gets a sequence number for ordering. For reliability, the broker may retain messages so disconnected subscribers can catch up. Consumer groups add load balancing - within a group, each message goes to only one member.

Part A: Basic Pub/Sub

Part B: Message History and Catch-up

Part C: Consumer Groups

Interview comments

Code solutions

Question 15 - Reader-Writer Lock

  • Difficulty: 9 / 10
  • Approximate lines of code: 60 LoC
  • Tags: concurrency

Description

A reader-writer lock (RWLock) allows concurrent read access but exclusive write access. Multiple readers can hold the lock simultaneously since reads don’t conflict, but writers need exclusive access to prevent data races. This is useful for shared data structures where reads vastly outnumber writes, like caches or configuration stores.

The basic invariants are: (1) if a writer holds the lock, no readers or other writers can hold it, and (2) if readers hold the lock, any number of additional readers can acquire it, but writers must wait. The tricky part is preventing writer starvation: without care, a steady stream of readers can indefinitely postpone waiting writers.

Part A: Basic Lock

Part B: Starvation Prevention

Part C: Upgradable Lock

Interview comments

Code solutions

Question 16 - Meeting Scheduler

  • Difficulty: 6 / 10
  • Approximate lines of code: 100 LoC
  • Tags: scheduling, algorithms

Description

A meeting scheduler finds common free time slots across multiple attendees in different timezones. The core algorithm: (1) normalize all busy intervals to a common timezone (UTC), (2) merge overlapping busy intervals, (3) find gaps between busy periods that meet the minimum duration requirement. The key data structure is a list of intervals that gets merged via sorting and linear scan.

Example: Alice (NYC) is busy 9-10am local, Bob (LA) is busy 7-8am local. In UTC, Alice is busy 14:00-15:00, Bob is busy 15:00-16:00. After merging, the combined busy period is 14:00-16:00 UTC.

Part A: Merge Busy Intervals and Find Gaps

Part B: Timezone Handling

Part C: Rank Slots by Preference

Interview comments

Code solutions

Question 17 - LSM Tree

  • Difficulty: 10 / 10
  • Approximate lines of code: 90 LoC
  • Tags: storage, data-structures

Description

An LSM (Log-Structured Merge) Tree is the storage engine behind modern databases like LevelDB, RocksDB, and Cassandra. It optimizes for write-heavy workloads by buffering writes in memory and periodically flushing them to disk as immutable sorted files called SSTables (Sorted String Tables). Reads check the in-memory buffer first, then search through SSTables from newest to oldest.

The key data structures are: (1) a memtable (typically a balanced tree or hash map) that holds recent writes in memory, (2) a list of SSTables on disk, each containing sorted key-value pairs, and (3) optional bloom filters to skip SSTables that definitely don’t contain a key.

Note: The sortedcontainers library (SortedDict) is available. Using SortedDict for the memtable gives sorted iteration during flush without a separate sort step.

Part A: Basic Operations

Part B: Delete Support

Part C: Compaction

Interview comments

Code solutions

Question 18 - Undo/Redo

  • Difficulty: 4 / 10
  • Approximate lines of code: 100 LoC
  • Tags: data-structures

Description

An undo/redo system allows users to reverse and replay operations in applications like text editors, graphics software, or games. The key insight is that operations must be reversible - you need to store enough information to reconstruct the previous state. There are two main approaches: the command pattern (store operations with their inverses) or the memento pattern (store full state snapshots). Command pattern uses less memory but requires each operation to be invertible; memento is simpler but stores redundant data.

Internal state typically includes two stacks: an undo stack (executed commands waiting to be undone) and a redo stack (undone commands waiting to be replayed). When a new command executes, it clears the redo stack - you cannot redo after making a new change.

Part A: Execute, Undo, Redo

Part B: Command Grouping

Part C: Memory Limits

Interview comments

Code solutions

Question 19 - Interval Tree

  • Difficulty: 9 / 10
  • Approximate lines of code: 100 LoC
  • Tags: data-structures

Description

An interval tree efficiently stores intervals (like meeting times, IP ranges, or genomic regions) and answers queries about which intervals contain a point or overlap with a query range. The naive approach of checking every interval is O(n) per query. An interval tree achieves O(log n + k) where k is the number of results by augmenting a BST with subtree metadata that enables pruning entire branches.

The key insight is storing max_end at each node: the maximum endpoint among all intervals in that subtree. When querying for point P, if P > node.max_end, you can skip the entire subtree since no interval there can contain P. The tree is ordered by interval start points (like a normal BST), with the max_end augmentation propagated up during insertions and deletions.

Note: The sortedcontainers library (SortedList) is available. If using a sorted list approach instead of a tree, SortedList gives O(log n) insertions instead of O(n) with bisect.insort.

Part A: Basic Structure

Part B: Overlap Queries

Part C: Deletion

Interview comments

Code solutions

Question 20 - Sparse Matrix

  • Difficulty: 3 / 10
  • Approximate lines of code: 70 LoC
  • Tags: data-structures

Description

A sparse matrix is a matrix where most elements are zero. Instead of storing all n*m elements, we only store non-zero values with their positions. Common representations include DOK (dictionary of keys) using {(row, col): value}, CSR (compressed sparse row) using arrays for values, column indices, and row pointers, and row-based dictionaries using {row: {col: value}}. The choice depends on access patterns: DOK is simple and good for random access, CSR is efficient for row operations and matrix multiplication.

The key insight for multiplication is that you only need to iterate over non-zero elements: for each non-zero A[i,k], multiply with each non-zero B[k,j] and accumulate into C[i,j]. This gives O(nnz_A * avg_row_nnz_B) instead of O(n^3).

Part A: Basic Get/Set Operations

Part B: Addition and Transpose

Part C: Matrix Multiplication

Interview comments

Code solutions

Question 21 - A* Pathfinding

  • Difficulty: 7 / 10
  • Approximate lines of code: 80 LoC
  • Tags: algorithms

Description

A* is the standard algorithm for finding the shortest path on a weighted graph, combining Dijkstra’s guaranteed optimality with a heuristic that guides the search toward the goal. It maintains a priority queue ordered by f(n) = g(n) + h(n), where g(n) is the actual cost from start and h(n) is the estimated cost to goal. The heuristic must be admissible (never overestimate) to guarantee the optimal path.

For grid-based pathfinding, common heuristics are Manhattan distance (for 4-directional movement) and Octile/Chebyshev distance (for 8-directional). The algorithm explores nodes in order of their f-score, updating path costs when a shorter route is found. Key data structures: a min-heap for the open set and a hash map for tracking best g-scores.

Part A: Basic A*

Part B: Diagonal Movement

Part C: Weighted Terrain

Interview comments

Code solutions

Question 22 - HyperLogLog

  • Difficulty: 9 / 10
  • Approximate lines of code: 100 LoC
  • Tags: probabilistic, data-structures

Description

HyperLogLog (HLL) is a probabilistic data structure for estimating the cardinality (count of unique elements) of a set using minimal memory. Redis uses it for PFCOUNT. The key insight: if you hash values uniformly, the probability of seeing a hash with N leading zeros is 1/2^N. So the maximum leading zeros you’ve observed gives you a rough estimate of log2(cardinality). With 10 leading zeros max, you’ve probably seen around 2^10 = 1024 unique elements.

The core data structure is an array of “registers” (typically 2^p registers, where p=14 gives 16384 registers). Each register stores the maximum leading-zeros-plus-one (called rho) seen for hashes that map to that bucket. To add an element: hash it, use the first p bits to pick a register, count leading zeros in the remaining bits, update the register if this count is higher. To estimate cardinality: compute the harmonic mean of 2^(-register) across all registers and apply bias correction.

Part A: Basic Structure

Part B: Improve Accuracy

Part C: Merge and Edge Cases

Interview comments

Code solutions

Question 23 - Collision Detection

  • Difficulty: 3 / 10
  • Approximate lines of code: 80 LoC
  • Tags: game/simulation, algorithms

Description

Collision detection determines whether two geometric shapes overlap. In 2D games and physics engines, the most common shapes are axis-aligned bounding boxes (AABBs) and circles. An AABB is defined by its min/max corners (min_x, min_y, max_x, max_y), while a circle is defined by center (x, y) and radius. The key insight for efficiency is to avoid expensive operations like square roots when possible - comparing squared distances works for most circle checks.

For large numbers of objects, a two-phase approach is used: broad-phase (spatial partitioning like grids or quadtrees) quickly eliminates pairs that can’t possibly collide, then narrow-phase performs precise geometric checks on remaining candidates.

Part A: AABB vs AABB Collision

Part B: Circle vs Circle Collision

Part C: AABB vs Circle Collision

Interview comments

Code solutions

Question 24 - Job Scheduler

  • Difficulty: 8 / 10
  • Approximate lines of code: 110 LoC
  • Tags: scheduling

Description

A job scheduler executes tasks based on priority while respecting dependency constraints. Jobs with higher priority run first, but only after all their dependencies have completed. The core data structures are a min-heap (priority queue) for O(log n) selection of the highest-priority ready job, and a graph of dependencies to track which jobs are blocked.

Internally, each job has a status (pending, running, completed, cancelled), a priority, and a set of dependency job IDs. When a job completes, you notify all jobs that depend on it. When all dependencies are satisfied, the job becomes ready and enters the priority queue.

Note: The sortedcontainers library (SortedList) is available. Unlike heapq, SortedList supports O(log n) removal of cancelled jobs instead of lazy deletion.

Part A: Priority Queue

Part B: Dependencies

Part C: Cancellation

Interview comments

Code solutions

Question 25 - Heartbeat Monitor

  • Difficulty: 5 / 10
  • Approximate lines of code: 90 LoC
  • Tags: distributed-systems, concurrency

Description

A heartbeat monitor tracks the health of distributed nodes by receiving periodic heartbeat signals. If a node fails to send a heartbeat within a timeout period, it is marked as unhealthy. This pattern is fundamental to distributed systems for failure detection - used in Kubernetes pod health checks, ZooKeeper session management, and service mesh health monitoring.

The core data structures are: (1) a map from node ID to last heartbeat timestamp, (2) a map tracking current node states (healthy/unhealthy), and (3) optionally a min-heap of deadlines for efficient expiration checking. The key challenge is triggering state-change callbacks only on transitions (healthyunhealthy or vice versa), not on every check.

Note: The sortedcontainers library (SortedList) is available. Unlike heapq, SortedList supports O(log n) removal when a node’s deadline is updated, avoiding stale heap entries.

Part A: Basic Health Tracking

Part B: State Change Callbacks

Part C: Efficient Expiration Checking

Interview comments

Code solutions

Question 26 - Bitmap Index

  • Difficulty: 2 / 10
  • Approximate lines of code: 120 LoC
  • Tags: data-structures, storage

Description

A bitmap index accelerates queries on low-cardinality columns (columns with few distinct values, like “status” or “color”). For each distinct value in a column, we maintain a bitmap where bit i is 1 if row i has that value. Queries become bitwise operations: AND for intersection, OR for union, NOT for complement. This is how databases like Oracle and PostgreSQL speed up OLAP queries.

For a table with 5 rows and a “color” column with values [“red”, “blue”, “red”, “green”, “blue”], we create three bitmaps:

  • red: 10100 (rows 0, 2)
  • blue: 01001 (rows 1, 4)
  • green: 00010 (row 3)

Finding rows where color=“red” AND size=“S” becomes red_bitmap & small_bitmap, which is O(n/64) operations using 64-bit words.

Part A: Basic Bitmap Operations

Part B: Multi-Condition Queries

Part C: NOT Operation and Compression Discussion

Interview comments

Code solutions

Question 27 - Rope

  • Difficulty: 8 / 10
  • Approximate lines of code: 130 LoC
  • Tags: data-structures

Description

A rope is a binary tree data structure for efficiently representing and manipulating long strings. Unlike a standard string where concatenation is O(n), a rope achieves O(1) concatenation by creating a new parent node that points to both strings. This makes ropes ideal for text editors where users frequently insert, delete, and concatenate text at arbitrary positions.

The key insight is that leaf nodes store actual string fragments, while internal nodes store only metadata. Each internal node tracks its “weight” - the total length of its left subtree - which enables O(log n) character indexing by comparing the target index against the weight to decide whether to go left or right.

Part A: Structure and Concatenation

Part B: Index Access

Part C: Split

Interview comments

Code solutions

Question 28 - Spell Checker

  • Difficulty: 6 / 10
  • Approximate lines of code: 80 LoC
  • Tags: algorithms, data-structures

Description

A spell checker validates words against a dictionary and suggests corrections for misspelled words. The core algorithm uses edit distance (Levenshtein distance) to find dictionary words within 1-2 edits of the misspelled word. Suggestions are ranked by a combination of edit distance and word frequency - “the” is more likely than “thee” even if both are 1 edit away from “teh”.

The naive approach computes edit distance against every dictionary word (O(n * m^2) where n=dictionary size, m=word length). Optimizations include BK-trees for faster neighbor search, or generating all possible edits and filtering by dictionary membership.

Part A: Dictionary Lookup

Part B: Edit Distance and Suggestions

Part C: Ranking with Combined Score

Interview comments

Code solutions

Question 29 - B-Tree

  • Difficulty: 10 / 10
  • Approximate lines of code: 90 LoC
  • Tags: data-structures, storage

Description

A B-tree is a self-balancing search tree optimized for systems that read and write large blocks of data, like databases and filesystems. Unlike binary trees where each node has 2 children, B-tree nodes have many keys and children (determined by the “order” or “minimum degree”). This reduces tree height, minimizing disk I/O operations.

Each node contains sorted keys and associated values. For a B-tree of order t (minimum degree): each non-root node has between t-1 and 2t-1 keys, and between t and 2t children. A node with k keys has k+1 children. All leaves are at the same depth. The tree grows upward (from the root) rather than downward.

Part A: Structure and Search

Part B: Insertion with Splitting

Part C: Root Growth

Interview comments

Code solutions

Question 30 - Transaction Log

  • Difficulty: 4 / 10
  • Approximate lines of code: 110 LoC
  • Tags: storage

Description

A transaction log provides atomicity for a key-value store: operations either fully complete or fully roll back, with no partial states visible. The key mechanism is storing the old value before each modification, enabling rollback by replaying these saved values in reverse order. This is the foundation of ACID transactions in databases.

Internal state includes: the actual data store (a dict), a write-ahead log (WAL) of operations with their old values, and savepoint markers. When you begin(), you start recording. When you commit(), you discard the log (changes are permanent). When you rollback(), you replay the log backwards to restore the original state.

Part A: Begin, Commit, Rollback

Part B: Savepoints

Part C: Durability via Write-Ahead Log

Interview comments

Code solutions

Question 31 - Indexed Priority Queue

  • Difficulty: 9 / 10
  • Approximate lines of code: 90 LoC
  • Tags: data-structures

Description

An indexed priority queue (IPQ) combines a priority queue with O(1) key lookup, enabling efficient priority updates. This is essential for graph algorithms like Dijkstra’s shortest path, where you need to decrease the priority of a vertex when you discover a shorter path. A standard heap gives O(log n) insert and extract-min, but updating a priority requires O(n) to find the element first.

The key insight is maintaining a hash map from keys to their positions in the heap array. Every time you swap elements during bubble-up or bubble-down, you must also update both keys’ positions in the hash map. This gives O(1) lookup of any key’s heap position, enabling O(log n) priority updates.

Part A: Basic Operations

Part B: Update and Remove

Part C: Edge Cases

Interview comments

Code solutions

Question 32 - Game of Life

  • Difficulty: 2 / 10
  • Approximate lines of code: 100 LoC
  • Tags: game/simulation

Description

Conway’s Game of Life is a cellular automaton on a 2D grid where each cell is either alive or dead. Cells evolve simultaneously according to four rules based on neighbor count (the 8 adjacent cells):

  1. Underpopulation: A live cell with < 2 neighbors dies
  2. Survival: A live cell with 2-3 neighbors survives
  3. Overpopulation: A live cell with > 3 neighbors dies
  4. Reproduction: A dead cell with exactly 3 neighbors becomes alive

The critical implementation detail is that all cells update simultaneously - you cannot modify the grid in-place while iterating. You need either a second buffer or a way to encode both old and new states.

Part A: Basic Simulation

Part B: Boundary Handling

Part C: Known Patterns and Infinite Grid

Interview comments

Code solutions

Question 33 - Barrier Sync

  • Difficulty: 7 / 10
  • Approximate lines of code: 90 LoC
  • Tags: concurrency

Description

A barrier is a synchronization primitive that blocks a group of threads until all of them have reached a certain point. Once all N threads call wait(), they are all released simultaneously. This is commonly used in parallel algorithms where each phase must complete before the next begins (e.g., parallel matrix operations, simulation steps).

The key implementation challenge is making the barrier reusable - after all threads pass through, the barrier should reset for the next round. A naive implementation fails when a fast thread re-enters the barrier before a slow thread has exited, causing the count to be wrong. The solution is “generation counting” - each barrier cycle has a generation number, and threads check their generation before proceeding.

Part A: Basic Barrier

Part B: Reusability

Part C: Timeout and Callback

Interview comments

Code solutions

Question 34 - Minesweeper

  • Difficulty: 3 / 10
  • Approximate lines of code: 80 LoC
  • Tags: game/simulation

Description

Minesweeper is a classic puzzle game where a grid contains hidden mines. Each non-mine cell displays the count of adjacent mines (0-8). The player reveals cells; revealing a mine loses the game, revealing all non-mine cells wins. The key data structure is a 2D grid of cells, each with state (hidden/revealed/flagged), whether it’s a mine, and its adjacent mine count.

The main algorithmic challenge is the cascade reveal: when revealing a cell with 0 adjacent mines, automatically reveal all neighbors recursively. This is a flood-fill algorithm (BFS or DFS) that stops when it hits cells with non-zero counts.

Part A: Board Setup with Adjacent Counts

Part B: Reveal with Cascade

Part C: Flags and Win Condition

Interview comments

Code solutions

Question 35 - Deadlock Detector

  • Difficulty: 10 / 10
  • Approximate lines of code: 90 LoC
  • Tags: concurrency, algorithms

Description

A deadlock occurs when threads are stuck waiting for each other in a cycle: Thread A holds Resource 1 and waits for Resource 2, while Thread B holds Resource 2 and waits for Resource 1. Neither can proceed. This is fundamentally a cycle detection problem in a directed graph called the “wait-for graph.”

The wait-for graph has threads as nodes. An edge from T1 to T2 means “T1 is waiting for a resource held by T2.” Internal state tracks two mappings: resource_owner: dict[resource_id, thread_id] (who holds each resource) and thread_waiting_for: dict[thread_id, resource_id] (what each blocked thread wants). From these, you can construct edges: if thread T waits for resource R, and R is owned by thread U, then there’s an edge T U. A cycle in this graph means deadlock.

Part A: Resource Tracking

Part B: Cycle Detection

Part C: Dynamic Updates

Interview comments

Code solutions

Question 36 - Metrics Collector

  • Difficulty: 5 / 10
  • Approximate lines of code: 90 LoC
  • Tags: storage

Description

A metrics collector is the foundation of application monitoring systems like Prometheus, StatsD, or Datadog. It tracks three fundamental metric types: counters (monotonically increasing values like request counts), gauges (point-in-time values like CPU usage), and histograms (distributions of observations like latency). Each observation is timestamped to enable time-range queries.

The core data structures are: (1) timestamped value lists per metric name, and (2) aggregation logic that differs by metric type. Counters accumulate, gauges return the latest value, and histograms compute statistics over observations.

Note: The sortedcontainers library (SortedList) is available. For histogram percentile queries, using SortedList instead of bisect.insort gives O(log n) insertions.

Part A: Counters and Gauges

Part B: Histograms

Part C: Time Range Queries

Interview comments

Code solutions

Question 37 - Packet Reassembler

  • Difficulty: 4 / 10
  • Approximate lines of code: 80 LoC
  • Tags: data-structures

Description

Network packets often arrive out of order due to routing differences, retransmissions, and varying network paths. A packet reassembler buffers incoming packets and reconstructs the original data stream by ordering packets by their sequence numbers. The core data structure is a dictionary or buffer mapping sequence numbers to packet data, combined with a pointer tracking the next expected sequence number. When contiguous packets are available starting from this pointer, they can be “flushed” to produce the reassembled output.

The key insight is that you need O(1) lookup by sequence number (dictionary) while also tracking what’s missing. Gap detection becomes important when you need to request retransmissions - you need to know which sequence numbers haven’t arrived yet between your expected position and the highest seen.

Part A: Basic Buffering and Reassembly

Part B: Duplicate and Gap Detection

Part C: Timeout-Based Retransmit Requests

Interview comments

Code solutions

Question 38 - Bloom Filter

  • Difficulty: 8 / 10
  • Approximate lines of code: 70 LoC
  • Tags: probabilistic, data-structures

Description

A Bloom filter is a space-efficient probabilistic data structure for set membership testing. Instead of storing actual elements, it uses a bit array and multiple hash functions. When you add an element, you hash it with k different functions and set the corresponding k bits to 1. To check membership, you verify all k bits are set. The key insight: false negatives are impossible (if an element was added, its bits are definitely set), but false positives can occur (bits might be set by other elements).

The tradeoff is simple: more bits and more hash functions reduce false positives, but use more memory and CPU. For n expected items and desired false positive rate p, optimal bit array size is m = -n*ln(p)/(ln(2)^2) and optimal hash count is k = (m/n)*ln(2).

Part A: Basic Structure

Part B: Optimal Sizing

Part C: Analysis

Interview comments

Code solutions

Question 39 - Cron Parser

  • Difficulty: 6 / 10
  • Approximate lines of code: 80 LoC
  • Tags: scheduling

Description

Cron is a time-based job scheduler in Unix-like systems. A cron expression consists of 5 fields: minute (0-59), hour (0-23), day of month (1-31), month (1-12), and day of week (0-6, where 0=Sunday). Each field supports wildcards (*), ranges (1-5), lists (1,3,5), and steps (*/15). The parser must convert these expressions into a set of valid values for each field, then compute when a job should next run.

Key implementation detail: Python’s datetime.weekday() returns Monday=0, but cron uses Sunday=0. You must convert: cron_weekday = (python_weekday + 1) % 7.

Note: The sortedcontainers library (SortedList) is available. Using SortedList for field values enables O(log n) lookup of the next valid value via bisect_left.

Part A: Parse Cron Expression

Part B: Calculate Next Run Time

Part C: Validation and Combination Syntax

Interview comments

Code solutions

Question 40 - Library System

  • Difficulty: 1 / 10
  • Approximate lines of code: 100 LoC
  • Tags: storage

Description

A library management system tracks books, users, and loans. Core entities are books (with unique IDs, titles, and status), users, and transactions. The key data structures are dictionaries mapping book_id to Book objects, with each Book containing its current status (available/borrowed/reserved), the current borrower, due date, and a waitlist queue.

The interesting complexity comes from the waitlist: when a borrowed book is returned, it should become reserved for the first person on the waitlist rather than generally available. Late fees require tracking due dates and computing days overdue.

Part A: Checkout and Return

Part B: Reservations and Waitlist

Part C: Late Fees

Interview comments

Code solutions

Question 41 - Write-Ahead Log

  • Difficulty: 10 / 10
  • Approximate lines of code: 100 LoC
  • Tags: storage, distributed-systems

Description

A Write-Ahead Log (WAL) provides durability for databases and storage systems. The core principle: before making any change to data, write a description of the change to a sequential log file and ensure it’s on disk (fsync). If the system crashes, replay the log to recover. This is faster than writing data directly because sequential writes are cheaper than random writes.

Key components: (1) log entries with sequence numbers for ordering, (2) fsync calls to guarantee durability (not just flush), (3) checksums to detect partial/corrupted writes, and (4) checkpointing to truncate old entries after they’re applied to the main data store.

Part A: Append and Replay

Part B: Crash Recovery

Part C: Checkpointing

Interview comments

Code solutions

Question 42 - ID Generator

  • Difficulty: 2 / 10
  • Approximate lines of code: 120 LoC
  • Tags: distributed-systems

Description

A Snowflake-style ID generator produces globally unique 64-bit IDs without coordination between machines. Twitter invented this pattern to generate tweet IDs at scale. The key insight is packing three components into 64 bits: a timestamp (for ordering), a machine ID (for uniqueness across nodes), and a sequence number (for uniqueness within the same millisecond). The standard layout uses 41 bits for timestamp, 10 bits for machine ID (1024 machines), and 12 bits for sequence (4096 IDs per millisecond per machine).

Internal state consists of: the last timestamp seen, the current sequence number within that millisecond, and the machine ID. When generating an ID, if the current time equals the last timestamp, increment the sequence; if sequence overflows, wait for the next millisecond. If time has moved forward, reset sequence to 0.

Part A: Basic ID Generation

Part B: Sequence Overflow and Clock Skew

Part C: ID Decoding and Coordination

Interview comments

Code solutions

Question 43 - Sudoku Solver

  • Difficulty: 4 / 10
  • Approximate lines of code: 80 LoC
  • Tags: algorithms, game/simulation

Description

A Sudoku solver fills in a 9x9 grid such that each row, column, and 3x3 box contains the digits 1-9 exactly once. This is a classic constraint satisfaction problem solved with backtracking: try a value, check if it violates constraints, recurse or backtrack. The naive approach tries all values 1-9 for each empty cell, but optimizations like MRV (Minimum Remaining Values) and constraint propagation dramatically reduce the search space.

The core data structures are: (1) a 9x9 grid with 0 representing empty cells, and (2) constraint tracking (which values are used in each row, column, and box). The box index for cell (r, c) is computed as 3 * (r // 3) + c // 3.

Part A: Basic Backtracking Solver

Part B: MRV Heuristic (Minimum Remaining Values)

Part C: Find All Solutions / Constraint Propagation

Interview comments

Code solutions

Question 44 - Circuit Breaker

  • Difficulty: 9 / 10
  • Approximate lines of code: 80 LoC
  • Tags: distributed-systems

Description

The circuit breaker pattern prevents cascading failures when calling unreliable external services. Instead of repeatedly hammering a failing service, the circuit breaker “trips” after detecting failures and fails fast for a cooldown period. This protects both your system (avoiding blocked threads waiting for timeouts) and the downstream service (giving it time to recover).

The pattern uses a three-state machine: CLOSED (normal operation, tracking failures), OPEN (failing fast, rejecting all calls), and HALF-OPEN (testing if the service recovered). The key data structures are failure counters/timestamps and the current state. The state transitions lazily on each call attempt rather than via background timers.

Part A: Basic State Machine

Part B: Rate-Based Thresholds

Part C: Thread Safety and Metrics

Interview comments

Code solutions

Question 45 - Load Balancer

  • Difficulty: 8 / 10
  • Approximate lines of code: 90 LoC
  • Tags: distributed-systems, scheduling

Description

A load balancer distributes incoming requests across multiple backend servers to improve throughput and availability. Different strategies optimize for different goals: round-robin ensures even distribution, least-connections routes to the server with the lightest current load, and weighted distribution gives more traffic to higher-capacity servers.

Internally, you track a list of servers with their health status and connection counts. The core operation is route_request() which selects a server based on the current strategy, increments its connection count, and returns it. When the request completes, release_connection() decrements the count.

Part A: Round Robin

Part B: Least Connections

Part C: Health Checks and Weighted Distribution

Interview comments

Code solutions