Distributed query processing: In a semijoin, is only the join attribute sent to the remote site, after which only the matching rows are returned?

Difficulty: Easy

Correct Answer: Applies — projecting join keys first reduces network traffic

Explanation:


Introduction / Context:
Semijoins are classic techniques in distributed query optimization to reduce data shipped over the network for join processing.


Given Data / Assumptions:

  • Relations reside at different sites.
  • Joins can be expensive if full tables are transmitted.
  • Join keys can be projected and used to filter remote rows.


Concept / Approach:
In a semijoin, site A sends the projection of its join attribute(s) to site B. Site B uses those values to select only matching rows and returns just those rows (or just their keys). This avoids transferring irrelevant tuples.


Step-by-Step Solution:
Project join attributes from the initiating site.Transmit the compact set of keys to the remote site.Filter remote tuples and return only matches.Complete the join locally with greatly reduced data movement.


Verification / Alternative check:
Cost models consistently favor semijoins when selectivity is high and key sets are small relative to full tables.


Why Other Options Are Wrong:
Saying semijoin “ships full tables” negates its purpose. Synchronization mode or partitioning does not define semijoin behavior.


Common Pitfalls:
Using semijoin when join selectivity is low (little reduction) can add overhead without benefit.


Final Answer:
Applies — projecting join keys first reduces network traffic

Discussion & Comments

No comments yet. Be the first to comment!
Join Discussion