In data warehousing concepts, a data warehouse is typically composed of which types of data sources?

Difficulty: Easy

Correct Answer: Historical and current data integrated from internal and external data sources.

Explanation:


Introduction / Context:
A data warehouse supports decision making and analytical processing by providing a consolidated, consistent view of data from across an organization. Unlike operational systems that focus on current transactions, a data warehouse typically stores large volumes of historical data, sometimes combined with near current data, integrated from multiple internal and external sources. This question asks you to identify the most accurate description of what a data warehouse is composed of.



Given Data / Assumptions:

  • We are dealing with data warehousing, not purely transactional systems.
  • The warehouse is used for reporting, business intelligence, and analytics.
  • The warehouse integrates data from various systems.
  • Historical data plays a major role in a warehouse.


Concept / Approach:
According to widely accepted definitions, a data warehouse is subject oriented, integrated, non volatile, and time variant. Time variant means that the warehouse contains historical data, often stored at different levels of aggregation. Integrated means that data from many internal systems and sometimes external sources is reconciled into a consistent schema. Current data can also be present, but the emphasis is on historical trends. A correct answer must reference historical and current data integrated from internal and external sources, not only a single type of log file or only current transactions.



Step-by-Step Solution:
Step 1: Recall that data warehouses store historical data, sometimes along with current snapshots, for analysis. Step 2: Recall that they integrate data from internal systems such as ERP, CRM, and other operational applications, as well as possible external feeds such as market data. Step 3: Examine Option A, which describes a data warehouse as historical and current data integrated from internal and external data sources. This matches standard textbook definitions. Step 4: Examine Option B, which restricts the data warehouse to only real time sensor data with no history; this is more characteristic of a real time monitoring system. Step 5: Examine Option C, which limits the warehouse to web logs; while logs may be included, they do not represent the full scope of a data warehouse. Step 6: Examine Option D, which says only current online transaction data from a single system; that is more like an operational database. Step 7: Examine Option E, which limits the warehouse to manually entered paper forms; this is not realistic in modern enterprise environments. Step 8: Conclude that Option A is the correct description.


Verification / Alternative check:
If you review descriptions of well known data warehouse implementations, such as those for retail, banking, or healthcare, you will see that they pull data from sales systems, inventory, customer relationship management, and sometimes external sources such as credit bureaus or demographics providers. Data is stored over many years and supports trend analysis and forecasting. This strongly supports Option A and contradicts the limited views presented in Options B through E.



Why Other Options Are Wrong:

  • Option B is wrong because real time sensor data without history does not support long term trend analysis.
  • Option C is wrong because a warehouse is not usually limited to web logs; it contains structured and sometimes unstructured data from many systems.
  • Option D is wrong because a single operational system with only current transactions is not a warehouse but an online transaction processing system.
  • Option E is wrong because relying solely on manual paper forms ignores the integration and automation that define a warehouse.


Common Pitfalls:
A common pitfall is to treat an operational database as a data warehouse just because it has many tables. Another mistake is to ignore the importance of integration and data quality; simply copying tables from multiple systems without cleaning and aligning them does not create a robust warehouse. Successful data warehousing projects involve careful design of dimensions, facts, and historical tracking mechanisms.



Final Answer:
The correct composition is described in Option A: Historical and current data integrated from internal and external data sources.


Discussion & Comments

No comments yet. Be the first to comment!
Join Discussion