Difficulty: Medium
Correct Answer: A correlated subquery is a subquery that refers to columns from the outer query and is evaluated once per row of the outer query, often used for row by row comparisons such as selecting rows that match an aggregate condition within their own group.
Explanation:
Introduction / Context:
Subqueries allow you to nest one query inside another. A special class of subqueries, called correlated subqueries, are tightly linked to the outer query and can be powerful but potentially expensive. This question asks what a correlated subquery is and when you might use it, which is important knowledge for SQL performance tuning and expressive queries.
Given Data / Assumptions:
Concept / Approach:
A correlated subquery is a subquery that references one or more columns from the outer query. Because of that reference, the subquery cannot be evaluated just once; it must be re evaluated for each row of the outer result. This design is useful for row level comparisons such as picking employees whose salary is above the average salary in their department, or orders that are greater than the average order total for the same customer. Correlated subqueries can sometimes be rewritten as joins or analytic functions for better performance, but they remain an important conceptual tool.
Step-by-Step Solution:
Step 1: Consider an example such as selecting employees whose salary is greater than the average salary of their department.Step 2: The outer query iterates over each employee row, exposing DEPTNO and SAL columns.Step 3: The inner subquery uses DEPTNO from the outer row, for example SELECT AVG(SAL) FROM EMP e2 WHERE e2.DEPTNO = e1.DEPTNO, creating a correlation between inner and outer queries.Step 4: For each outer row, the database evaluates the inner subquery with the appropriate DEPTNO and compares the employee salary to that department average.Step 5: This dependence on outer columns and per row re evaluation is what makes the subquery correlated, which matches the description in option A.
Verification / Alternative check:
If you remove the reference to the outer query from the subquery and it still makes sense to evaluate the subquery once independently, then it is not correlated. Conversely, if the subquery relies on a column alias or table alias from the outer query, the database has to re evaluate it for each outer row, which confirms its correlated nature.
Why Other Options Are Wrong:
Option B focuses only on aggregate functions, which can appear in both simple and correlated subqueries. Option C describes a simple subquery or scalar subquery scenario, not a correlated one. Option D incorrectly restricts correlated subqueries to the SELECT list, while in practice they are most common in WHERE and HAVING clauses.
Common Pitfalls:
Correlated subqueries can be slow on large tables because they conceptually execute many times. Developers sometimes write correlated subqueries where a join or analytic function would be more efficient. Another pitfall is misunderstanding the correlation condition, which can lead to incorrect results if the join predicate in the subquery is not written carefully.
Final Answer:
A correlated subquery is a subquery that refers to columns from the outer query and is evaluated once per row of the outer query, often used for row by row comparisons such as selecting rows that match an aggregate condition within their own group.
Discussion & Comments