Data Join in AWS Redshift

In “Amazon Redshift Database Developer Guide“, there is an explanation for data join:
“HASH JOIN and HASH are used when joining tables where the join columns are not both distribution keys and sort keys.
MERGE JOIN is used when joining tables where the join columns are both distribution keys and sort keys, and when less than 20 percent of the joining tables are unsorted.”

Let’s take ‘salary’ and ’employee’ for example.

Firstly, we EXPLAIN the join of ‘salary’ and ’employee’, and it shows “Hash Join”:



Then we create two new tables:

Currently, the join column is both distkey and sortkey. Hence EXPLAIN shows “Merge Join”:



Leave a Reply

Your email address will not be published. Required fields are marked *

*
*