Difficulty: Easy
Correct Answer: Applies — free-text remarks fields commonly mask real attributes
Explanation:
Introduction / Context:
Legacy datasets often include “remarks,” “notes,” or “misc” columns that hold heterogeneous text. While useful for humans, they impede data quality and analytics because business attributes get buried in unstructured text. This question asks whether such columns are a common design problem.
Given Data / Assumptions:
Concept / Approach:
General remarks columns reduce observability and break declarative constraints. The remedy is to identify recurring data elements within the text and promote them to explicit typed columns or child tables; keep the remarks field only for truly unstructured narrative.
Step-by-Step Solution:
Profile the remarks data; mine for recurring patterns (e.g., “Reason:”, “Code=”).Create new attributes with proper domains (status, reason_code).Backfill using parsing rules where feasible; otherwise, leave unmatched cases as text.Enforce constraints and foreign keys for the new attributes.Retain a remarks field for free text, but discourage embedding structured data in it.
Verification / Alternative check:
After extraction, queries become simpler and more accurate; data quality metrics (completeness, validity) improve.
Why Other Options Are Wrong:
Issues do not depend on text length or OLAP vs. OLTP; any unstructured catch-all field can hide structure.
Common Pitfalls:
Attempting to parse every legacy note perfectly; failing to provide users with new fields to capture structured values going forward.
Final Answer:
Applies — free-text remarks fields commonly mask real attributes
Discussion & Comments