In legacy or imported datasets, are missing values (for example, blanks or NULLs) a common design/data-quality problem that must be addressed during database design and loading?

Difficulty: Easy

Correct Answer: Applies — missing or blank values are common and must be handled

Explanation:


Introduction / Context:
Data imported from spreadsheets or legacy systems often contains missing values represented as NULLs or blanks. This affects constraint enforcement, analytics, and application behavior. The question asks if this is a common problem that must be planned for.



Given Data / Assumptions:

  • Different sources use different conventions (NULL, empty string, zero).
  • Some attributes are mandatory for business rules.
  • Loading and validation are part of the design process.



Concept / Approach:
Designers must define canonical representations and constraints. Required attributes should be NOT NULL; optional fields should be documented. ETL processes may impute, default, or reject rows with missing critical data.



Step-by-Step Solution:
Profile each column’s completeness.Declare NOT NULL and CHECK constraints where applicable.Transform blanks to NULLs consistently during load.Apply reference data lookups to replace missing codes where possible.Report residual gaps and create remediation workflows.



Verification / Alternative check:
Data quality dashboards showing completeness before/after ETL confirm improvements. Sample queries expose remaining blanks or NULLs.



Why Other Options Are Wrong:
Production data often includes missing values. The problem is not limited to text columns and is independent of foreign keys.



Common Pitfalls:
Treating empty strings as valid when the business requires a value; inconsistent NULL/blank handling across systems.



Final Answer:
Applies — missing or blank values are common and must be handled

Discussion & Comments

No comments yet. Be the first to comment!
Join Discussion