Difficulty: Easy
Correct Answer: The inconsistent values problem
Explanation:
Introduction:
GROUP BY aggregates identical values, which makes it a powerful first-pass tool for profiling categorical data. By counting occurrences of each distinct value, inconsistencies such as misspellings, mixed abbreviations, or case differences become immediately visible.
Given Data / Assumptions:
Concept / Approach:
SELECT column, COUNT() FROM table GROUP BY column ORDER BY COUNT() DESC reveals all distinct entries and their frequencies. This highlights inconsistent values across records and guides standardization. GROUP BY is less effective for problems that do not manifest as repeated categorical variants (for example, free-text remarks or multicolumn attributes).
Step-by-Step Solution:
1) Write a GROUP BY query on the suspect attribute.2) Inspect the list of distinct categories and counts.3) Identify spelling variants, abbreviations, and case differences.4) Propose standardization rules or reference lists to correct the data.
Verification / Alternative check:
Run complementary functions such as UPPER(), TRIM(), or REGEXP comparisons to normalize and re-check counts, confirming true inconsistencies versus format noise.
Why Other Options Are Wrong:
Common Pitfalls:
Assuming GROUP BY fixes data quality; it only reveals patterns. Actual remediation requires cleansing rules and possibly reference data.
Final Answer:
The inconsistent values problem
Discussion & Comments