Using operational data for BI The “curse of dimensionality” most closely relates to which problem when using operational data for analytics and reporting?

Difficulty: Easy

Correct Answer: Too much data (very high dimensional feature space).

Explanation:


Introduction / Context:
The “curse of dimensionality” refers to the challenges that arise as the number of features (dimensions) grows. In BI and data mining, high dimensionality can make distance metrics unreliable, models prone to overfitting, and queries computationally expensive. This question checks your conceptual mapping of that phrase to a typical analytics problem category.


Given Data / Assumptions:

  • Operational systems can generate many attributes per entity (customer, product, event).
  • Analysts may be tempted to carry hundreds or thousands of features into BI models.
  • We classify typical data quality/integration issues separately from dimensionality.


Concept / Approach:

As dimensionality increases, data becomes sparse; nearest-neighbor distances converge; and model generalization worsens without careful regularization, feature selection, and dimensionality reduction. Therefore, the phrase aligns most with “too much data” in the sense of too many columns/features, not merely volume of rows.


Step-by-Step Solution:

1) Interpret “curse” as problems due to high number of features.2) Distinguish from dirty or inconsistent values (data quality issues).3) Distinguish from non-integration (system architecture issue).4) Conclude the best match is “Too much data (very high dimensional feature space).”


Verification / Alternative check:

Machine learning texts consistently equate the phrase with high-dimensional spaces that undermine intuitive geometry and require feature selection or dimensionality reduction (e.g., PCA).


Why Other Options Are Wrong:

  • Dirty/inconsistent data: quality problems, not dimensionality.
  • Non-integrated data: siloing and schema mismatch, not feature count.
  • Sparse keys: modeling concern but not the core of the “curse.”


Common Pitfalls:

  • Confusing “big data” (rows) with “high-dimensional data” (columns).
  • Throwing all attributes into models without regularization.


Final Answer:

Too much data (very high dimensional feature space).

Discussion & Comments

No comments yet. Be the first to comment!
Join Discussion