This post is also available in:
This article “Seven types of table joins” was written in preparation for a user training and I don’t want to keep it from you. At the very end, I will also show two ways of circumventing case sensitivity.
Table of Contents
- What types are there?
- What needs to be considered?
- How to merge?
- Advanced example – case-insensitive text merge
- Special Case – Cross Join
1. What types are there?
Tables that have columns with the same contents can easily be merged via the user interface. Several or only one column can be defined as a merge criterion. A total of six types of table merging are available via the user interface.
- Left outer Join
- Right outer Join
- Full outer Join
- Inner Join
- Left Anti Join
- Right Anti Join
Graphical representation of the join types
2. What needs to be considered?
Each of the tables to be merged must have the same columns, although these can also have different names. However, the data types must be the same.
Note that columns with the data type integer are merged more efficiently than columns with the data type text. If, however, and in exceptional cases, the merge is to take place using text columns, you must also pay attention to upper and lower case.
For example, Power Query does not recognize the words “House” and “house” as the same word.
The fact table (the one containing the transactions) is selected as the first table and the dimension table as the second. When merging pure dimension tables (e.g. “Products” and “Product groups”), the more detailed dimension is selected first.
If the number of rows resulting from the merge is greater than the number before the merge, check whether the correct join type has been selected and, if necessary, adjust it.
3. How to merge?
The merge is done via the user interface in the menu item “Home” under “Combine“.
The result can either be output to a new table (“Merge Queries as New”) or to the first table (“Merge Queries”).
Fact table A contains various transactions. Dimension table B lists the storage locations of the products. Since both tables each have the column “ProductID” in the data type “integer”, we can merge them using this column.
The two tables must be combined in all six possible ways. First, import the data from both tables into Power Query and rename the Queries to “Table A” and “Table B”.
4.1 Left Outer Join
The left outer join returns all rows from the first table and finds the matching rows from the second table. We can define which column should be returned. This function corresponds to the “VLookup” command in Excel.
The first step is to load both tables using Power Query. Then we merge them as a new query.
In the opened dialog, mark the column “ProductID” in both tables and select “Left Outer” as the join type.
Next, expand the new column by clicking on the symbol as usual and select the column to be returned. I selected all columns. If required, you can also define a column prefix.
I loaded the result into Excel to visualize it. As desired, all products of Table A were supplemented with the corresponding columns from Table B. If there is no match, empty fields are returned.
4.2 Right Outer Join
The right outer join returns all rows from the second table and selects the matching rows from the first table. Exactly the opposite of the first variant.
The procedure is exactly the same and does not change in the other examples. Only the join type has to be changed.
The result looks like this and contains all products from table B and the corresponding columns from table A:
4.3 Full Outer Join
The full outer join retrieves all rows from both tables.
This join type is the third item in the drop-down menu.
As a result, we received a table combined from Tables A and B with all products.
4.4 Inner Join
Unlike the outer join, this join type returns only the matching rows.
This join type is the fourth item in the drop-down menu.Result
The result includes only products that are included in both tables.
4.5 Left Anti Join
This join type returns those rows that are only contained in the first table.
This join type is the fifth item in the drop-down menu.Result
The result shows which products are only included in Table A, but not in Table B.
4.6 Right Anti Join
This join type only returns those rows that are only contained in the second table.
This join type is the sixth item in the drop-down menu.
In contrast to the left anti-join, here we see those products which are only contained in Table B, but not in Table A.
4.7 Recap – six types of table joins
I compare the existing six types of table merging again.
5. Advanced example –
case-insensitive text merge
We have executed the above examples exclusively via the user interface and using columns of the data type “integer”. As mentioned above, it is also possible to merge text columns using upper and lower case.
The two tables below serve as the data bases:
It is immediately noticeable that the spellings of the names are completely different. The result of a normal join (left external join) is as follows:
A match was found for a single name. Due to the different spelling, Power Query has classified the remaining words as inconsistent.
The desired result should look like this:
1. Approach: New columns with capitalization
A simple approach may be to add a new column in each table, which converts the names to uppercase.
Of course, the two new columns must be deleted afterwards.
2. Approach: Write “M” Code
The following code can also be used to get around upper and lower case. But just a word of warning – with large data sets the code execution will be slow!
Instead of a join, the data from the second table is added as a new column called “Merge” (Table.AddColumn). The link is made via the “Trans” table.
The function (case_insensitive) selects all rows from the “Dept” table and compares them with the rows from the first table.
The comparison is done by changing the spelling of all words in the two columns (“Name”) to upper (Text.Upper).
Last but not least, the new column “Merge” has to be extended. Since this is a recursive function, I recommend the application only in small data sets!
6. Special Case – Cross Join
The cross join is a special case. It forms a Cartesian product from two tables and there are no columns that serve as join conditions. All rows of the left table are merged with all rows of the right table.
For this example, I created two new tables:
The cross join is initiated in Power Query using a new column to be added. In the user dialog you only have to refer to the other table.
As a second step, the column is extended and we get the following table:
The 3 values of the first table are combined with the 3 values of the second table. This results in 9 rows, which represent all possible combinations.
This join type can very quickly lead to huge results with large tables. The execution of the cross join can be very time-consuming and should only be used in exceptional cases.
Dear readers, I write my articles for you. It is my ambition to provide you with qualitatively appealing content. Please take a minute and give me a review or comment below. Thank you very much.
More articles about Power Query can be found in the Power Query Online Buch.
7 thoughts on “Power BI – Seven Types of Table Joins”
When I merge to tables together with Power BI, I get an Inner Join that gives me the content of both tables.You define an inner join as only what is in both tables. As far as I can see, you switch inner and outer joins.
Thanks for the help, it’s really useful to have an illustrated review about the types of merging.
Thank you Leon – I write to help people. I’m glad you like it 😉
Thank you Andre – I’m glad you like the post. 😉
The cross join in Power Query showed expected results (Table A 5 records duplicate IDs, Table B 4 Records duplicate IDs)
5×4 = 20 records, but when you create table with the visuals it count correctly (20) and it show 15 only (elimination of duplication from table B)
See Example Table A Id = 1,2,3,4,4 And Table B ID = 2,4,4,5
How do I show all records?
In your example there are double value pairs (1/4, 2/4, 3/4, 4/4 twice).
Therefore, the Power BI visuals display the double pairs only once,
which corresponds to the standard behavior of Power BI.
So I don’t see how this can be avoided until the values are unique. Sorry, that I can’t help you.