Difficulty: Easy
Correct Answer: Valid (inconsistent codes and formats are common in legacy data)
Explanation:
Introduction / Context: When building a database from existing datasets, one of the first obstacles is data quality. Inconsistent values—such as multiple spellings for the same concept, irregular date formats, mixed units, or ad hoc code systems—can derail a clean relational design and must be identified and corrected during profiling.
Given Data / Assumptions:
Concept / Approach: Recognize and standardize synonymous or conflicting values (for example, “NY”, “N.Y.”, “New York”). Create reference tables and enforce foreign keys. Apply CHECK constraints for formats and ranges. Normalize units and use consistent data types. These steps reduce heterogeneity so the model can enforce integrity reliably.
Step-by-Step Solution:
Profile each column for distinct values, patterns, and outliers.Map variants to canonical codes (for example, use ISO country/state codes).Introduce reference data tables and constraints to enforce consistency going forward.Backfill and correct legacy rows to meet the new standards.Verification / Alternative check: After cleansing and constraints, attempts to insert inconsistent data should fail, proving the system now enforces uniformity. Reports should show reduced distinct value counts for standardized domains.
Why Other Options Are Wrong:
Common Pitfalls: Overlooking subtle variations (case, whitespace, punctuation); failing to convert units; neglecting historical data that do not meet current rules.
Final Answer: Valid (inconsistent codes and formats are common in legacy data)
Discussion & Comments