Data warehousing and ETL quality: Evaluate the statement below and choose the most accurate option.\n“During Extract–Transform–Load (ETL), the role of the process is to identify erroneous data and to fix them.”\nAssume a modern DW/BI pipeline with data quality rules, profiling, and stewardship.

Difficulty: Easy

Correct Answer: Incorrect

Explanation:


Introduction / Context:
ETL (Extract–Transform–Load) pipelines are central to data warehousing and business intelligence. They extract data from sources, transform it to conform to target models and quality rules, and load it into a warehouse or data lakehouse. This question probes whether ETL’s role is to both find erroneous data and always fix those errors.



Given Data / Assumptions:

  • Typical pipeline includes data profiling, cleansing, standardization, and validation checks.
  • Organizations often have data governance, stewardship, and source-system remediation processes.
  • “Fix” means altering data values to meet quality rules.


Concept / Approach:
ETL is designed to detect data quality issues (range, referential integrity, format, duplication). However, the appropriate response is not always “fix in-flight.” Many programs instead standardize values, enrich with reference data, and reject or quarantine bad records for review. Root-cause correction ideally happens in source systems via stewardship, preventing recurring issues. Therefore, saying ETL’s role is to identify and fix errors overstates its mandate.



Step-by-Step Solution:

Extract data from sources and profile for anomalies (nulls, patterns, outliers).Apply transformations: standardize codes, parse fields, conform dimensions, and validate business rules.On failure, route to error handling: reject, quarantine, or soft-correct only where safe (for example, trimming whitespace).Log data-quality metrics and trigger stewardship tickets for source correction.


Verification / Alternative check:
Review governance playbooks: they differentiate between non-destructive standardization vs. destructive “fixes,” favoring traceability and source remediation.



Why Other Options Are Wrong:

  • “Correct” implies ETL must always fix; in practice, it often flags/rejects and escalates.
  • “Applies only when a master data system exists” conflates MDM with ETL responsibilities.
  • “Valid for extract but not for load” misunderstands where validation occurs—primarily in transform/validation stages.
  • “Indeterminate without the SLA” SLAs define timeliness/availability, not ETL’s core purpose.


Common Pitfalls:
Silent corrections that lose lineage; over-cleansing that masks source defects; lack of feedback loops to fix sources.



Final Answer:
Incorrect

More Questions from Data Warehousing

Discussion & Comments

No comments yet. Be the first to comment!
Join Discussion