Difficulty: Easy
Correct Answer: Upgrading data quality before it is moved into the warehouse
Explanation:
Introduction / Context:
Data scrubbing (cleansing) removes or corrects errors, inconsistencies, and duplicates to improve reliability. Performing cleansing before loading prevents polluting the warehouse and downstream analytics.
Given Data / Assumptions:
Concept / Approach:
Scrubbing is typically part of the “T” in ETL, prior to the load step. It may leverage rules, reference tables, postal standardization, and fuzzy matching to unify entities (e.g., customers).
Step-by-Step Solution:
Verification / Alternative check:
Most ETL toolchains and best practices recommend data quality gates prior to load (reject/repair/route records accordingly).
Why Other Options Are Wrong:
After load: Possible but suboptimal; fixes should prevent bad data from entering.
Index creation: Unrelated to cleansing.
Common Pitfalls:
Deferring quality fixes until after loading, which increases rework and can corrupt aggregates.
Final Answer:
Upgrading data quality before it is moved into the warehouse
Discussion & Comments