Difficulty: Easy
Correct Answer: Valid (inconsistent codes and formats are common in legacy data)
Explanation:
Introduction / Context:
When building a database from existing datasets, one of the first obstacles is data quality. Inconsistent values—such as multiple spellings for the same concept, irregular date formats, mixed units, or ad hoc code systems—can derail a clean relational design and must be identified and corrected during profiling.
Given Data / Assumptions:
Concept / Approach:
Recognize and standardize synonymous or conflicting values (for example, “NY”, “N.Y.”, “New York”). Create reference tables and enforce foreign keys. Apply CHECK constraints for formats and ranges. Normalize units and use consistent data types. These steps reduce heterogeneity so the model can enforce integrity reliably.
Step-by-Step Solution:
Verification / Alternative check:
After cleansing and constraints, attempts to insert inconsistent data should fail, proving the system now enforces uniformity. Reports should show reduced distinct value counts for standardized domains.
Why Other Options Are Wrong:
Common Pitfalls:
Overlooking subtle variations (case, whitespace, punctuation); failing to convert units; neglecting historical data that do not meet current rules.
Final Answer:
Valid (inconsistent codes and formats are common in legacy data)
Discussion & Comments