This article “Seven types of table joins” was written in preparation for a user training and I don’t want to keep it from you. At the very end, I will also show two ways of circumventing case sensitivity.
Table of Contents
- What types are there?
- What needs to be considered?
- How to merge?
- Advanced example – case-insensitive text merge
- Special Case – Cross Join
1. What types are there?
Tables that have columns with the same contents can easily be merged via the user interface. Several or only one column can be defined as a merge criterion. A total of six types of table merging are available via the user interface.
- Left outer Join
- Right outer Join
- Full outer Join
- Inner Join
- Left Anti Join
- Right Anti Join
Graphical representation of the join types
2. What needs to be considered?
Each of the tables to be merged must have the same columns, although these can also have different names. However, the data types must be the same.
Note that columns with the data type integer are merged more efficiently than columns with the data type text. If, however, and in exceptional cases, the merge is to take place using text columns, you must also pay attention to upper and lower case.
For example, Power Query does not recognize the words “House” and “house” as the same word.
The fact table (the one containing the transactions) is selected as the first table and the dimension table as the second. When merging pure dimension tables (e.g. “Products” and “Product groups”), the more detailed dimension is selected first.
If the number of rows resulting from the merge is greater than the number before the merge, check whether the correct join type has been selected and, if necessary, adjust it.
3. How to merge?
The merge is done via the user interface in the menu item “Home” under “Combine“.
Fact table A contains various transactions. Dimension table B lists the storage locations of the products. Since both tables each have the column “ProductID” in the data type “integer”, we can merge them using this column.
The two tables must be combined in all six possible ways. First, import the data from both tables into Power Query and rename the Queries to “Table A” and “Table B”.
4.1 Left Outer Join
The left outer join returns all rows from the first table and finds the matching rows from the second table. We can define which column should be returned. This function corresponds to the “VLookup” command in Excel.
In the opened dialog, mark the column “ProductID” in both tables and select “Left Outer” as the join type.
Next, expand the new column by clicking on the symbol as usual and select the column to be returned. I selected all columns. If required, you can also define a column prefix.
I loaded the result into Excel to visualize it. As desired, all products of Table A were supplemented with the corresponding columns from Table B. If there is no match, empty fields are returned.
4.2 Right Outer Join
The right outer join returns all rows from the second table and selects the matching rows from the first table. Exactly the opposite of the first variant.
The result looks like this and contains all products from table B and the corresponding columns from table A:
4.3 Full Outer Join
The full outer join retrieves all rows from both tables.
As a result, we received a table combined from Tables A and B with all products.
4.4 Inner Join
Unlike the outer join, this join type returns only the matching rows.
4.5 Left Anti Join
This join type returns those rows that are only contained in the first table.
4.6 Right Anti Join
This join type only returns those rows that are only contained in the second table.
In contrast to the left anti-join, here we see those products which are only contained in Table B, but not in Table A.
4.7 Recap – six types of table joins
I compare the existing six types of table merging again.
5. Advanced example –
case-insensitive text merge
We have executed the above examples exclusively via the user interface and using columns of the data type “integer”. As mentioned above, it is also possible to merge text columns using upper and lower case.
The two tables below serve as the data bases:
It is immediately noticeable that the spellings of the names are completely different. The result of a normal join (left external join) is as follows:
A match was found for a single name. Due to the different spelling, Power Query has classified the remaining words as inconsistent.
The desired result should look like this:
1. Approach: New columns with capitalization
A simple approach may be to add a new column in each table, which converts the names to uppercase.
Of course, the two new columns must be deleted afterwards.
2. Approach: Write “M” Code
The following code can also be used to get around upper and lower case. But just a word of warning – with large data sets the code execution will be slow!
Instead of a join, the data from the second table is added as a new column called “Merge” (Table.AddColumn). The link is made via the “Trans” table.
The function (case_insensitive) selects all rows from the “Dept” table and compares them with the rows from the first table.
The comparison is done by changing the spelling of all words in the two columns (“Name”) to upper (Text.Upper).
Last but not least, the new column “Merge” has to be extended. Since this is a recursive function, I recommend the application only in small data sets!
6. Special Case – Cross Join
The cross join is a special case. It forms a Cartesian product from two tables and there are no columns that serve as join conditions. All rows of the left table are merged with all rows of the right table.
For this example, I created two new tables:
The cross join is initiated in Power Query using a new column to be added. In the user dialog you only have to refer to the other table.
As a second step, the column is extended and we get the following table:
The 3 values of the first table are combined with the 3 values of the second table. This results in 9 rows, which represent all possible combinations.
This join type can very quickly lead to huge results with large tables. The execution of the cross join can be very time-consuming and should only be used in exceptional cases.
Dear readers, I write my articles for you. It is my ambition to provide you with qualitatively appealing content. Please take a minute and give me a review or comment below. Thank you very much.
More articles about Power Query can be found in the Power Query Online Buch.
|Post provides benefit||5|
|Post is understandable||5|
|Appropriate post length||5|