In SQL, what is a correlated subquery and how does it differ from a simple subquery?

Difficulty: Medium

Correct Answer: A correlated subquery is a subquery that refers to columns from the outer query and is re executed for each row of the outer query.

Explanation:


Introduction / Context:
Subqueries allow one SQL query to use the result of another query. Among these, correlated subqueries form a special class where the inner query depends on values from the outer query. Understanding how correlated subqueries work is important for writing flexible SQL and for reasoning about performance, because they are evaluated differently from simple, non correlated subqueries. This is a common database interview question.


Given Data / Assumptions:

  • We are working with a relational database that supports nested queries, such as Oracle, MySQL or PostgreSQL.
  • The outer query pulls rows from one or more tables.
  • The inner subquery can use columns from the outer query in its WHERE clause.
  • The database engine may re evaluate the subquery for each row of the outer result set.


Concept / Approach:
A correlated subquery is one that cannot be evaluated independently of the outer query, because it references one or more columns from that outer query. The inner query is effectively parameterised by the current row of the outer query. As the outer query processes each row, the subquery runs again with the appropriate values. This contrasts with a simple, non correlated subquery that can be evaluated once and treated as a constant set of results for the outer query. Correlated subqueries are powerful, but if misused they can lead to performance issues due to repeated execution.


Step-by-Step Solution:
Step 1: Recall that a basic subquery is written inside another query, often in the WHERE, HAVING or FROM clause. Step 2: In a correlated subquery, the inner query includes references such as outer_table.column in its conditions. Step 3: Because of this reference, the database cannot compute the subquery result just once; it must recalculate it for each row of the outer query, using the current row values. Step 4: For example, in a query that finds employees whose salary is above the average salary in their own department, the subquery must use the department of the current employee, making it correlated. Step 5: This behaviour confirms that a correlated subquery is tied to the outer query row by row, matching the definition in option A.


Verification / Alternative check:
Consider the SQL example: SELECT e.empno, e.sal FROM emp e WHERE e.sal > (SELECT AVG(sal) FROM emp WHERE deptno = e.deptno); The subquery SELECT AVG(sal) FROM emp WHERE deptno = e.deptno refers to e.deptno from the outer query. For each employee e, the database computes the average salary for that employee department and compares e.sal to that average. This computation is repeated for each row, which is characteristic of a correlated subquery. If the subquery did not refer to e.deptno and simply returned a single average for all employees, it could be evaluated once and reused, and would not be correlated.


Why Other Options Are Wrong:
Option B describes a non correlated subquery that can run independently and is typically evaluated once, not a correlated one. Option C focuses on aggregate functions, but correlation is about references to outer query columns, not the presence of aggregates. Option D restricts correlated subqueries to the SELECT clause, which is not correct because they often appear in WHERE or HAVING clauses. Option E claims that a correlated subquery always returns exactly one row, which is not required; correlated subqueries can return one or many rows depending on the logic and operators used. Only option A accurately describes a correlated subquery as one that references outer query columns and is re executed per outer row.


Common Pitfalls:
Developers sometimes use correlated subqueries where a join or a derived table would be more efficient, leading to unnecessary repeated computation. Another pitfall is misunderstanding the scope of column references and accidentally creating a correlated subquery without intending to. Good practice is to recognise when you are referencing outer query columns and to be aware that the database may execute the inner query many times. In interviews, explaining both the definition and performance implications of correlated subqueries shows a deeper understanding of SQL behaviour.


Final Answer:
A correlated subquery is a subquery that refers to columns from the outer query and therefore is evaluated repeatedly, once for each row processed by the outer query.

Discussion & Comments

No comments yet. Be the first to comment!
Join Discussion