In SQL, what do you accomplish by using a GROUP BY clause together with a HAVING clause in a SELECT statement?

Difficulty: Medium

Correct Answer: You create groups of rows based on one or more columns and then filter those groups using conditions on aggregate values such as SUM or COUNT

Explanation:


Introduction / Context:
The GROUP BY and HAVING clauses are central features of SQL when working with aggregated data. Many interview questions focus on them because developers often confuse HAVING with WHERE or misunderstand how grouping works. Knowing how GROUP BY and HAVING operate together allows you to answer typical business questions such as finding customers with totals above a threshold or departments with more than a certain number of employees.


Given Data / Assumptions:

  • We are executing a SELECT statement that may include aggregate functions such as SUM, COUNT, AVG, MIN, or MAX.
  • We want to group rows that share common values in one or more columns.
  • We need to apply conditions to the aggregated results, not just to individual rows.


Concept / Approach:
The GROUP BY clause causes the database engine to partition the result set into groups based on the values of the specified columns. Aggregate functions then operate on each group, producing one result row per group. The HAVING clause is applied after grouping and aggregation, and it filters entire groups based on conditions that typically involve aggregate values. This is different from WHERE, which filters individual rows before grouping. Together, GROUP BY and HAVING provide powerful ways to summarise and filter aggregated data in a single query.


Step-by-Step Solution:
Step 1: Start with a base SELECT that references a table such as EMP and columns like DEPTNO and SALARY. Step 2: Add GROUP BY DEPTNO to tell the database to collect all rows with the same department number into a single group for aggregation. Step 3: Use aggregate functions in the SELECT list, for example SELECT DEPTNO, AVG(SALARY) FROM EMP GROUP BY DEPTNO to compute one average salary per department. Step 4: Introduce a HAVING clause to filter the groups, for example HAVING AVG(SALARY) > 50000, which keeps only departments whose average salary exceeds 50000. Step 5: Execute the query and observe that the result contains only grouped rows that satisfy the aggregate condition specified in HAVING.


Verification / Alternative check:
You can verify this behaviour by running the same query without the HAVING clause and noting that all groups appear. Then, adding HAVING filters out entire groups. Trying to move the aggregate condition into the WHERE clause usually generates an error or incorrect results, because WHERE cannot directly reference aggregate functions in most SQL dialects. Documentation and query execution plans show that grouping occurs before HAVING is evaluated, confirming that GROUP BY creates groups and HAVING filters them based on aggregated values.


Why Other Options Are Wrong:
Option B is incorrect because ORDER BY, not GROUP BY, controls sorting, and GROUP BY does not automatically remove duplicates without aggregation context. Option C is wrong because GROUP BY and HAVING do not physically update or collapse records on disk; they affect only the logical result set of the query. Option D is incorrect because GROUP BY with HAVING does not define or create a new table; it produces a query result that can be materialised with CREATE TABLE AS SELECT or similar statements, but that is not the default behaviour.


Common Pitfalls:
A common pitfall is using HAVING to filter rows that could be filtered earlier in WHERE, causing unnecessary work and slower queries. Another mistake is forgetting to include all non aggregated columns in the GROUP BY clause, which leads to SQL errors in strict modes. Developers also sometimes misinterpret HAVING as a general place for any condition, instead of remembering that it is intended for group level filters, usually based on aggregates. Understanding the order of operations helps you write correct, efficient queries.


Final Answer:
By using GROUP BY with HAVING, you first group rows based on specified columns and then filter those groups using conditions on aggregate values such as SUM, COUNT, or AVG.

Discussion & Comments

No comments yet. Be the first to comment!
Join Discussion