When designing from existing data, is it a common problem to find multiple values for a single attribute stored in one cell (for example, comma-separated lists), which violates First Normal Form (1NF)?

Difficulty: Easy

Correct Answer: Applies — multi-valued cells are a common anti-pattern and violate 1NF

Explanation:


Introduction / Context:
First Normal Form requires that attribute values be atomic and that there be no repeating groups. Legacy files often pack multiple values into one cell (e.g., “red,blue,green”). This question asks whether that is a common design problem.



Given Data / Assumptions:

  • Existing datasets frequently include delimited strings representing sets.
  • Queries must filter or join on individual values.
  • We aim to normalize the design for integrity and queryability.



Concept / Approach:
Multi-valued cells impede indexing and correctness. Proper modeling splits values into child rows or uses an association table for many-to-many relationships. This aligns with 1NF and supports set-based operations efficiently.



Step-by-Step Solution:
Detect columns containing delimiters or repeated patterns.Split data into a child table with one value per row (entity-attribute or association table).Create keys and foreign keys to enforce integrity.Backfill and validate counts to ensure no data loss.Refactor downstream queries to use joins rather than string parsing.



Verification / Alternative check:
After normalization, predicates and joins on individual values become sargable and perform better with indexes.



Why Other Options Are Wrong:
1NF prohibits list-in-cell regardless of data type or system; NoSQL allowance does not change relational rules.



Common Pitfalls:
Leaving the old delimited column as authoritative; failing to handle whitespace/casing during backfill.



Final Answer:
Applies — multi-valued cells are a common anti-pattern and violate 1NF

Discussion & Comments

No comments yet. Be the first to comment!
Join Discussion