Data Join in AWS Redshift

In “Amazon Redshift Database Developer Guide“, there is an explanation for data join:
“HASH JOIN and HASH are used when joining tables where the join columns are not both distribution keys and sort keys.
MERGE JOIN is used when joining tables where the join columns are both distribution keys and sort keys, and when less than 20 percent of the joining tables are unsorted.”
Let’s take ‘salary’ and ’employee’ for example.
Firstly, we EXPLAIN the join of ‘salary’ and ’employee’, and it shows “Hash Join”:

Then we create two new tables:

CREATE TABLE salary_new
distkey (employee_id)
sortkey (employee_id)
AS SELECT * FROM salary;
CREATE TABLE employee_new
distkey (employee_id)
sortkey (employee_id)
AS SELECT * FROM employee;

Currently, the join column is both distkey and sortkey. Hence EXPLAIN shows “Merge Join”:

Robin on Linux

Data Join in AWS Redshift

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply