Within ETL, what does the “extract” process typically capture from operational systems?

Difficulty: Easy

Correct Answer: A subset of data from various operational systems

Explanation:


Introduction / Context:
ETL (Extract, Transform, Load) feeds the data warehouse with curated data. Understanding “extract” clarifies scope and performance expectations in pipelines.



Given Data / Assumptions:

  • Operational systems contain transactional, current-state data.
  • Warehouses require only data relevant to analytics and compliance.
  • Full extractions are rare due to volume, cost, and privacy constraints.


Concept / Approach:
Extraction commonly filters to specific tables, columns, and time windows (e.g., last N days), sometimes incrementally via CDC (change data capture). It is aimed at operational sources, not decision-support systems (which are downstream.



Step-by-Step Solution:

Identify source: operational systems.Identify scope: subset aligned with analytics needs.Therefore: “A subset of data from various operational systems.”


Verification / Alternative check:
ETL patterns (full load, incremental load, CDC) all emphasize extracting only necessary data for efficiency and compliance.



Why Other Options Are Wrong:
All data: Impractical and unnecessary.
Decision support systems: Not the typical extraction source; they are targets or consumers.



Common Pitfalls:
Pursuing “land everything” strategies that bloat storage and slow processing without added value.



Final Answer:
A subset of data from various operational systems

More Questions from Data Warehousing

Discussion & Comments

No comments yet. Be the first to comment!
Join Discussion