Difficulty: Medium
Correct Answer: A bitmap index represents the values of low cardinality columns using bit vectors, allowing very fast combination of conditions with bitwise operations, which is especially useful in read intensive data warehouse queries.
Explanation:
Introduction / Context:
Bitmap indexes, also called bit mapped indexes, are a powerful indexing technique often used in data warehouse and analytical databases. They are particularly effective for columns with low cardinality, such as gender, status flags, or small sets of discrete values. Interview questions about bitmap indexes aim to test your understanding of how they work and why they improve performance for certain types of queries.
Given Data / Assumptions:
Concept / Approach:
A bitmap index uses bit vectors to represent the presence or absence of a value for each row. For a low cardinality column, the index stores a separate bit vector for each distinct value. Each position in the vector corresponds to a row in the table; a bit is set to 1 if the row has that value and 0 otherwise. Combining filters becomes a matter of performing fast bitwise AND, OR, or NOT operations on these vectors, which is very efficient for large data sets.
Step-by-Step Solution:
Step 1: Define a bitmap index as an index that stores bitmaps (bit vectors) indicating which rows contain each distinct value of a column.
Step 2: Explain that for a column such as gender with values M and F, the index would maintain two bitmaps, one for M and one for F, each as long as the table.
Step 3: Describe how a query filtering on gender = 'M' can quickly locate matching rows by scanning the bitmap for M to find positions where the bit is set to 1.
Step 4: Show that combining conditions like gender = 'M' AND region = 'East' becomes a bitwise AND between the bitmap for M and the bitmap for East, which is computationally cheap.
Step 5: Emphasize that bitmap indexes shine in data warehouses with many read queries and relatively few concurrent updates, especially on low cardinality columns.
Verification / Alternative check:
Performance tests on analytical workloads often reveal that queries using bitmap indexes on several low cardinality columns execute much faster than equivalent queries using traditional B tree indexes. Query plans show bitmap operations combining multiple predicates before accessing the fact table, reducing disk I/O. Documentation from database vendors confirms that bitmap indexes are recommended for data warehouse scenarios with infrequent updates and many complex, multidimensional filters.
Why Other Options Are Wrong:
Option B incorrectly describes bitmap indexes as mechanisms for printing documents, which is unrelated. Option C limits bitmap indexes to foreign key enforcement in OLTP databases, but they are actually less suited to OLTP due to update overhead. Option D suggests storing rows as images, which misunderstands the bit vector structure used for indexing values, not for storing full records.
Common Pitfalls:
A key pitfall is using bitmap indexes on high cardinality or rapidly changing columns in OLTP systems, which can lead to locking issues and poor performance. Another mistake is assuming bitmap indexes replace all other index types; in reality, B tree indexes may still be better for range scans or primary key lookups. Effective index design chooses bitmap indexes where their bitwise combination strengths align with the typical query patterns in a data warehouse.
Final Answer:
A bitmap index is an index that represents low cardinality column values as bit vectors so that complex filters can be evaluated quickly using bitwise operations, making it especially useful for read intensive data warehouse queries.
Discussion & Comments