Difficulty: Easy
Correct Answer: Applies — projecting join keys first reduces network traffic
Explanation:
Introduction / Context:
Semijoins are classic techniques in distributed query optimization to reduce data shipped over the network for join processing.
Given Data / Assumptions:
Concept / Approach:
In a semijoin, site A sends the projection of its join attribute(s) to site B. Site B uses those values to select only matching rows and returns just those rows (or just their keys). This avoids transferring irrelevant tuples.
Step-by-Step Solution:
Project join attributes from the initiating site.Transmit the compact set of keys to the remote site.Filter remote tuples and return only matches.Complete the join locally with greatly reduced data movement.
Verification / Alternative check:
Cost models consistently favor semijoins when selectivity is high and key sets are small relative to full tables.
Why Other Options Are Wrong:
Saying semijoin “ships full tables” negates its purpose. Synchronization mode or partitioning does not define semijoin behavior.
Common Pitfalls:
Using semijoin when join selectivity is low (little reduction) can add overhead without benefit.
Final Answer:
Applies — projecting join keys first reduces network traffic
Discussion & Comments