Difficulty: Medium
Correct Answer: Lookup tables provide reference mappings such as codes to descriptions that ETL and reports use for translation, while aggregate tables store precomputed summarized data to speed up queries over large fact tables.
Explanation:
Introduction / Context:
Lookup tables and aggregate tables are two important supporting structures in data warehousing and ETL design. They are used to simplify transformations, standardize values, and improve query performance. Interviewers often ask about them to see whether you understand how to make data warehouses both accurate and efficient for reporting.
Given Data / Assumptions:
Concept / Approach:
Lookup tables store standardized reference data such as code to description mappings, country lists, currency conversion factors, or other reusable mappings. ETL jobs use them to transform raw source values into consistent warehouse values, and reporting tools use them to decode codes into human readable labels. Aggregate tables, on the other hand, store precomputed summaries such as daily sales by product and region. Instead of calculating totals from the entire detailed fact table for every report, queries can read these smaller, aggregated tables for faster response.
Step-by-Step Solution:
Step 1: Define a lookup table as a relatively small, stable table used to map input values to standardized, descriptive, or surrogate values.
Step 2: Explain that ETL processes often join to lookup tables to translate source system codes into conformed dimensions or to derive additional attributes.
Step 3: Define an aggregate table as a table that stores pre summarized data, for example sales_amount aggregated by date, product_category, and region.
Step 4: Describe how queries that need only summarized results can read from aggregate tables instead of scanning the granular fact table, thereby improving performance.
Step 5: Emphasize that both lookup and aggregate tables are maintained as part of ETL or data integration processes and play different but complementary roles.
Verification / Alternative check:
Examining ETL logs and data warehouse schemas usually reveals specific reference tables used across many jobs: for example, a Country_Code_Lookup that standardizes country names across systems. BI environments also commonly show summary tables such as Sales_Day_Product_Region used by executive dashboards. Performance monitoring often demonstrates that queries hitting aggregate tables run significantly faster than equivalent queries that aggregate from raw transaction level facts, confirming the role of aggregate tables in optimization.
Why Other Options Are Wrong:
Option B mislabels lookup tables as transactional logs and aggregate tables as schema backups, neither of which reflects their actual purpose. Option C confuses lookup tables with authentication mechanisms and aggregate tables with operating system configuration, which are unrelated domains. Option D equates these tables with cluster indexes and claims they cannot be queried directly; in reality, lookup and aggregate tables are normal tables that can be queried, joined, and managed like any other.
Common Pitfalls:
A common mistake is embedding reference mappings directly in ETL code instead of using centralized lookup tables, which makes maintenance harder and increases the risk of inconsistencies. Another pitfall is creating too many aggregate tables with similar structures, leading to confusion and extra storage consumption. Good practice carefully identifies shared lookup data and the most useful aggregates and then documents how and when they should be used.
Final Answer:
Lookup tables are small reference tables that provide reusable mappings such as codes to descriptions for ETL and reports, while aggregate tables are precomputed summary tables that store aggregated measures to improve the performance of analytical queries.
Discussion & Comments