To be honest, I am once again completely excited about the latest possibilities in Power BI! Something great has happened – Dataflows are here!
Today, 6.11.2018, the new Power BI dataflows were released for public preview. They extend the possibilities of data storage for Power BI. This article is based on the experiences made so far (i.e. today…) and will certainly not be complete yet. And I am convinced that the current functionalities will be massively expanded in the coming months.
If you want to read more about Power BI, subscribe to our newsletter.
Overview of dataflows
What are dataflows?
Dataflows supplement the existing Power BI offer with the possibility to store data from various sources in the cloud. This means that data no longer must first be stored in a (possibly expensive) database, an access file, a crash-prone Excel file (……) or similar.
With dataflows, data for a workspace can be imported, edited and stored via the user interface in the Power BI Service.
To make life easier for the user, i.e. me and you, the possibilities of Power Query are used for data import and transformation. Although the number of available transformations is (still?) limited, it allows you to perform the most important steps.
As known from Power BI Desktop, each query forms its own table. Nothing unknown so far, except that this happens in the Power BI Service and not in the Power BI Desktop.
What data sources are available?
Currently, the number of available data sources is still limited, as shown in the figure below.
But also, here I believe that the most important sources can be integrated with it already at the start.
Where is the data of a dataflow stored?
The created queries, respectively their data, are stored in an Azure Data Lake Gen2 Storage, i.e. in the cloud, in so-called entities. To put it simply, an entity corresponds to a table or the result of a Power Query query. There are already predefined entities in the Power BI Service that can be filled with the results of the queries. At the same time, it is also possible to create your own entities.
What are the conditions for using dataflows?
As of today, Dataflows are available for both the Pro and Premium Power BI versions. The possibility of data storage is already included in the license price and there are no additional costs.
How can I use and evaluate the data from dataflows?
The stored data can be used and further processed in the Power BI Desktop via the new connector “Power BI Dataflows”.
It is also possible to combine and enrich the data with other data sources. The data can be further processed with Power Query, and the creation of measures also works without restrictions.
Benefits of Dataflows
Free data storage!
The new Power BI dataflows provide a free way to store data in Azure and use it for reporting with Power BI. There are no costs for new hardware or cloud storage services.
Dataflows can be a replacement for databases
The data is stored in the dataflows as entities.
You can imagine an entity as a table stored per workspace. Although, technically speaking, each entity is more like a multiple csv files (plus one JSON file for the metadata) in one folder (analogous to the Common Data Model), the behaviour for the user is equal to that of a database table.
Enhancement of existing Power BI data
Users can quickly create new reports from the dataflows and enrich existing reports with additional data without having to resort to the IT department.
Even before the introduction of dataflows, existing Power BI data could be reused, but without the possibility of enrichment. For this purpose, an existing Power BI data set was integrated as data source.
However, as already mentioned, there is no possibility to supplement and enrich these data with further data. This limitation no longer exists with the dataflows.
There are also some restrictions.
Dataflows not available in personal workspace
The dataflows are not available in the personal work area.
But this also makes sense, because the personal area of work…… is personal.
Data “only” per workspace
If you do not have a premium version, you can only use the data per workspace. In the Premium version, it is also possible to link and use data across multiple workspaces.
Incremental or full loading
With the incremental load, only new or changed data is considered in the load process. However, as in Power BI Desktop, the data can only be loaded completely in the Pro version. As usual, all data is loaded from the source and existing data records are overwritten. Incremental loading is only possible if the premium variant is used.
Dataflows are no substitute for a data warehouse
I have often heard that dataflows mean death for classic data warehouses (DWH). I am rather sceptical now!
As seen, incremental loading is not possible in the Pro version. This leads to the fact that dataflows are no replacement for a data warehouse (DWH) for me at the moment.
In a DWH the data is never overwritten. New datasets are additionally saved and old, no longer valid datasets are “historicized”.
Why does that matter?
Let’s take an evaluation, which also covers several time periods shows how high the turnover per customer region was.
In a DWH, the previous residential address is used for this purpose in a relocation still held, but set to inactive. The new address is also saved and marked as active. This procedure enables that geolocal analyses can continue to be performed even in the past. are possible, since it is still known where customer X lived before the move.
This kind of historicizing (also called Slow Changing Dimensions) is not Dataflows not possible, because all data is always updated and therefore also changes are lost.
Whether existing data in the Premium Version can be set to inactive, i.e. changed, is completely beyond my knowledge due to the lack of a test possibility. If you know anything about this, please let us know in the comment function.
GDPR and data protection….
I will beware to let me out here on the branches. But it is clear and obvious that despite all enthusiasm the defaults of the data protection as well as of GDPR respectively the respective country regulations must be kept. This question is to be clarified, however, independently of the use of Dataflows in advance.
(preliminary) closing remarks
Once again, I am thrilled – all the data usable in Power BI can now also be imported, edited and stored in the cloud.
And this with the familiar functionalities of Power Query. The whole thing has been built in a very user-friendly way.
I think with the introduction of the Dataflows we are at the beginning of a development whose implications are not quite clear to me yet. However, my gut tells me that a game changer has been launched on the market here.
In the near future I will deal with the topic even more and report back to you.
What do you think of the Dataflows? What are your thoughts? Please let us know in the comments section.