Difficulty: Easy
Correct Answer: A subset of data from various operational systems
Explanation:
Introduction / Context:ETL (Extract, Transform, Load) feeds the data warehouse with curated data. Understanding “extract” clarifies scope and performance expectations in pipelines.
Given Data / Assumptions:
Concept / Approach:Extraction commonly filters to specific tables, columns, and time windows (e.g., last N days), sometimes incrementally via CDC (change data capture). It is aimed at operational sources, not decision-support systems (which are downstream.
Step-by-Step Solution:
Identify source: operational systems.Identify scope: subset aligned with analytics needs.Therefore: “A subset of data from various operational systems.”Verification / Alternative check:ETL patterns (full load, incremental load, CDC) all emphasize extracting only necessary data for efficiency and compliance.
Why Other Options Are Wrong:All data: Impractical and unnecessary. Decision support systems: Not the typical extraction source; they are targets or consumers.
Common Pitfalls:Pursuing “land everything” strategies that bloat storage and slow processing without added value.
Final Answer:A subset of data from various operational systems
Discussion & Comments