In database indexing terminology, what is meant by index cardinality and why is it important for query performance?

Difficulty: Medium

Correct Answer: Index cardinality is the number of distinct key values stored in an index, which helps the optimizer estimate selectivity and choose efficient access paths

Explanation:


Introduction / Context:
Index cardinality is an important concept in relational database tuning. Interviewers often ask about it to see whether you understand how optimizers decide when an index is useful. Cardinality affects selectivity estimates, which in turn influence the choice between index scans, index only access, and full table scans. Knowing what index cardinality means and why it matters is essential for performance aware database design.


Given Data / Assumptions:

  • We are working with an indexed column or set of columns in a relational database such as DB2, Oracle, or SQL Server.
  • The database maintains statistics about tables and indexes, including row counts and distinct values.
  • The optimizer uses these statistics to choose execution plans for SQL queries.


Concept / Approach:
Index cardinality refers to the number of distinct key values present in an index. For a single column index, this is the count of unique values in that column. High cardinality means many distinct values relative to the number of rows, while low cardinality means many rows share the same value. The optimizer uses cardinality to estimate how many rows a predicate is likely to return; high selectivity predicates that match few rows often benefit from index access, while low selectivity predicates may be better served by scanning the table.


Step-by-Step Solution:
Step 1: Consider an index on EMPLOYEE.GRADE where GRADE can take values 1 through 10. The index cardinality is the number of distinct grade values actually present in the data. Step 2: If the table has 1 million rows but only 3 distinct values for GRADE, the cardinality is low; many rows share each value, and an equality predicate on GRADE may still retrieve a large portion of the table. Step 3: If instead there is an index on EMPLOYEE_ID where each employee has a unique identifier, the index cardinality is very high and equal to the number of rows; an equality predicate on EMPLOYEE_ID is highly selective. Step 4: The optimizer uses cardinality statistics from the catalog to estimate how many rows will match a given predicate, which guides decisions about using an index versus performing a full scan. Step 5: When statistics are stale or inaccurate, the optimizer may misjudge cardinality, choose suboptimal plans, and degrade performance, which is why keeping statistics current is important.


Verification / Alternative check:
You can verify index cardinality by running queries such as SELECT COUNT(DISTINCT column) FROM table and comparing the results with the statistics stored in the database catalogs. Tools that display execution plans often show estimated cardinalities for intermediate steps; these estimates are based on index and column statistics. When you update statistics and re-explain queries, you may see changes in estimated cardinality and corresponding changes in chosen access paths, demonstrating the practical impact of cardinality information.


Why Other Options Are Wrong:
Option B is incorrect because the total number of bytes on disk is a measure of index size, not cardinality. Option C is wrong because an index is defined on one table or partitioned structures, not shared arbitrarily by multiple tables in the sense described. Option D is incorrect; there is no fixed limit on the number of rebuilds, and rebuild count is unrelated to the concept of cardinality.


Common Pitfalls:
A common pitfall is creating indexes on columns with very low cardinality, such as a gender flag or a simple status that only has a few values. Such indexes provide little selectivity and may not help performance, yet they add overhead to inserts and updates. Another issue is neglecting to refresh statistics after large data changes, leading to outdated cardinality estimates and poor plan choices. Understanding index cardinality helps you design better indexes, choose appropriate indexed columns, and explain why certain queries do or do not benefit from indexing.


Final Answer:
Index cardinality is the number of distinct key values stored in an index, and it is important because the optimizer uses this information to estimate selectivity and choose efficient access paths for SQL queries.

Discussion & Comments

No comments yet. Be the first to comment!
Join Discussion