Difficulty: Easy
Correct Answer: A subset of data from various operational systems
Explanation:
Introduction / Context:
ETL (Extract, Transform, Load) feeds the data warehouse with curated data. Understanding “extract” clarifies scope and performance expectations in pipelines.
Given Data / Assumptions:
Concept / Approach:
Extraction commonly filters to specific tables, columns, and time windows (e.g., last N days), sometimes incrementally via CDC (change data capture). It is aimed at operational sources, not decision-support systems (which are downstream.
Step-by-Step Solution:
Verification / Alternative check:
ETL patterns (full load, incremental load, CDC) all emphasize extracting only necessary data for efficiency and compliance.
Why Other Options Are Wrong:
All data: Impractical and unnecessary.
Decision support systems: Not the typical extraction source; they are targets or consumers.
Common Pitfalls:
Pursuing “land everything” strategies that bloat storage and slow processing without added value.
Final Answer:
A subset of data from various operational systems
Discussion & Comments