Difficulty: Medium
Correct Answer: Sends only the join attributes to a remote site and then returns only the required matching rows.
Explanation:
Introduction / Context:Network I/O is a major cost in distributed joins. A classic optimization is the semijoin, which uses projections on join attributes to filter remote relations before transferring full tuples, thereby cutting down data shipped across the network.
Given Data / Assumptions:
Concept / Approach:
In a semijoin, the initiating site sends only the distinct join attribute values (π_k(R)) to the remote site holding S. The remote site filters S to S’ = σ_{k ∈ π_k(R)}(S) and returns only the matching rows (or sometimes just their keys). This preselection step avoids shipping irrelevant tuples from S that would not contribute to the final join.
Step-by-Step Solution:
1) Project join attributes from the first relation.2) Ship this small set of key values to the remote site.3) Filter the remote relation by those keys to obtain only relevant rows.4) Return the reduced set (or keys) to complete the join with the local relation.Verification / Alternative check:
Semijoin-based query plans are widely cited in distributed optimization, especially when selectivity is high and join attribute domains are much smaller than full tuples.
Why Other Options Are Wrong:
Common Pitfalls:
Final Answer:
Sends only the join attributes to a remote site and then returns only the required matching rows.
Discussion & Comments