Merge Join Transformation:
Just like Merge and Union All, even this transformation is used for merging two SORTED datasets. Feature that makes this transformation standout from Merge and Union All is the ability to merge two datasets based on a JOIN condition. Even Lookup transformation uses the Join feature using which it provides the matching record from the lookup dataset, but lookup provides only first matching record when it has multiple records matching whereas, Merge Join provides all the matching records from the other dataset which we are merging.
Using the below
sample datasets, let’s explore the different JOIN’s provided by MERGE JOIN
Transformation:
Dataset 1:
Col A
|
Col B
|
1
|
1.a
|
2
|
1.b
|
3
|
1.c1
|
3
|
1.c2
|
4
|
1.d
|
5
|
1.e
|
Dataset 2:
Col A
|
Col B
|
1
|
2.a
|
2
|
2.b1
|
2
|
2.b2
|
3
|
2.c
|
6
|
2.d
|
7
|
2.e
|
As shown below, configure and SORT the above datasets using ‘Excel Source’ and ‘SORT Transform’, drag ‘MERGE JOIN’ from Data Flow Transformations. First map the Output of SORT_Dataset_1 to Merge Join and select ‘Input’ as ‘Merge Join Left Input’:
Now, map the output of Sort_Dataset_2 to Merge Join:

Open ‘Merge Join Editor’ to configure it:

As shown above, we can see by default ‘Col A’ is selected as ‘Join Key’, if we want to change it click on the arrow joining the JOIN Keys and press DELETE. Now you can set other column as the Join Key. In this example we will keep the join key as ‘Col A’.
1 Inner Join: This is an equi join in which only the matching records from both datasets are provided as Output.
In the ‘Merge Join Transformation Editor’, from ‘Join Type’ drop-down select ‘Inner join’. ‘Swap Inputs’ button will remain disabled. Select all the columns from both datasets and give them the ‘Output Alias’ as shown below:

Output:

In the above output, it can be seen only those records which are present in both datasets are given as output. Also it can be seen, dataset 1 Col_A found two matches for value 3 in dataset 2 and dataset 2 Col_A found two matches for value 2 in dataset 1, due two which we have two output rows with value 2 and other two with value 3. This is the features in which differentiate Merge Join from lookup in which only first matching row is given as output.
2. Left Outer Join: In this all the records present in Left input and only the matching records from right input are provided as output. In case when there's no matching record in Right input, all the selected Right input columns will hold NULL. This transformation does not provide Right Outer join as Left Outer join can be used as Right Outer join just by clicking button "Swap Inputs".
Open the Merge Join Transformation Editor, from ‘Join type’ select ‘Left outer join’. Now we can see ‘Swap Inputs’ button is been Enabled.

Over here, dataset_1 is ‘Left dataset’ and dataset_2 is ‘Right dataset’. Click ‘OK’ and re-execute the task.
Output:

In the output, it can be seen all the records from Left dataset (dataset 1) and only matching records from Right dataset (dataset 2). For the records where no match was found, Right dataset columns hold NULL.
Using the Left-Outer join as a Right-Outer join:
Open ‘Merge Join Transformation Editor’, click on ‘Swap Inputs’, now you can see both the datasets will be swapped, now Dataset ‘Sort_Dataset_2’ will be Left input and ‘Sort_Dataset_1’ will be Right input:

Click ‘OK’ and re-execute the task.
Output:

In the output, all the records from Dataset 2 (Right table) and only matching records from Dataset 1 (Left table) are given in Output.
3. Full Outer join: In this join all the records present in both datasets are provided as output. In case when the Left dataset does not find any matching record in Right dataset based on the join condition, then in that case in output the right dataset columns will hold NULL and the vice-versa.
Open ‘Merge Join Transformation Editor’, from ‘Join type’ select ‘Full Outer join’.

Click ‘OK’. Re-execute the task.
Output:

In the output, all the records from Dataset 1 and Dataset 2 are given in Output, and where ever match is not found, for that NULL is given in the corresponding Dataset columns.
Like Merge even for this transformation:
- Sorted datasets are mandatory.
- Join condition columns must matching data types.
- Requires two inputs and provides only one output.
- Does not support error handling.
- it’s a partially blocking Asynchronous transformation.
This transformation does not support cross join, but it’s possible to create the same just by adding a Dummy column in Derived Column Transformation in both the flows with value TRUE or 1. Place this transformation before Sort transformation in both the flows, sort the datasets on this column and use it as join condition as well in the merge join. Output of this will be a Cross join.
For this, in the above created DFT, add ‘Derived Column Transformation’ before SORT in the both the flows:

SORT both datasets on this newly added ‘Dummy_Column’:

Output:
No comments:
Post a Comment