Today’s blogpost is devoted to Power BI, Microsoft’s robust business analytics service. Specifically, we’ll describe the impact scheduled data refreshes can have on report development. Let’s get started!
When initiating a Power BI project – regardless of its size and complexity – we always ask the same kick-off questions. The answers to these three important questions help us gain an understanding of the project requirements. The questions are:
- What is the data source(s)? This gives us clarity in terms of online vs. on-premises, application(s), database(s), data models, data volume, etc.
- What do the reports look like? This helps us understand the types of visuals, page layout, navigations, data modeling, calculations, RLS, etc.
- Where will users view the reports? We need to know if it’s Power BI Service, Dynamics 365, Portals, 3rd party applications, Websites, SharePoint, PowerApps, Mobile Apps, etc.
This part of the discussion is from the developer’s perspective, though management and customers are generally interested in the running costs and maintenance expenses associated with running Power BI reports. In this way, licenses often become a big part of the discussion.
End users are always excited about the dynamic and interactive Power BI visualizations for their operations and business decision-making. However, if the performance of reports is poor, the user experience quickly turns sour, leading to failed user adoption.
The keys to Power BI project success are basic:
- Low Cost
- High Performance
The challenge is in striking a balance between Low Cost and High Performance. Finding that balance is critical, and yet there is an important and oft-forgotten issue that must be addressed up front: Data Refresh.
Data Refresh is often discussed later in projects, but ideally it is confronted early on, so all parties have a firm grasp of the topic. A quick overview:
Power BI has two distinct data refreshes: Power BI Desktop refresh and Power BI Service refresh. The purpose of the refresh is the same between Power BI Desktop and Power BI Service. However, there is a very important element of the refresh that we must understand: “pbix” file size limitation. This is applied when we publish pbix files from Power BI Desktop to Power BI Service and when we set a scheduled data refresh.
Why is it important? Because the pbix file size limitation could affect the design of the data model, the performance, and the cost – unless it is discussed and assessed accurately in the project discovery phase. The worst-case scenario is having Power BI reports ready for production but not being able to use them because the data refresh could not be completed in Power BI Service due to file size limitations.
Since Power BI Desktop has no pbix file size limitation, it is easy and relatively common to assume Power BI Service doesn’t either. But now we know!
Please note: the file size limitation we are referring to is not file Storage size limitation in Power BI Service Workspaces.
Here, we need to discuss Power BI licenses to find a cost associated to the file size limitation for data refresh. Why? Because the number of users determines the license cost and – at the same time – the pbix file size limitation impacts the cost, as well.
In the discovery phase, we should already know the approximate number of users. Additionally, we should measure initial pbix file size and estimate incremental data volume for the future when selecting licenses. The table below shows an approximate cost per license type and per pbix file size limitation for data refresh.
Note that the pbix file size limitation is not applied to the size of each pbix file but rather is a sum of pbix file size per license. For example, if we have 3 pbix files for a project and the sum of the 3 pbix files is 600 MB (less than 1GB), the Pro license is applicable.
The pbix file size depends on the data volume in the data source, the number of datasets and visuals in Power BI, and more. Let’s look at two examples that illustrate how to choose licenses based on data volume and pbix file size.
1. pbix File Size: 250 MB
- Dataset A: 600,000 rows with 30 columns
- Dataset B: 4.2 million rows with 3 columns
- Total Rows: 4.8 million and 33 columns
- Data Refresh in Power BI Service: ~30 minutes
- Data Source: Dynamics 365 Online Web API v9.0
- License: Pro License (Shared Capacity)
Criteria: Pro License Selected
- A small number of users: 50
- Initial pbix file size is less than or equal to 25% of 1 GB limitation
- Refresh time in Power BI Service: 30 minutes – within 1 hour
2. pbix File Size: 1.25 GB
- Dataset A: 13 million rows with 80 columns
- Dataset B: 100 million rows with 15 columns
- Total Rows: 113 million and 95 columns
- Data Refresh in Power BI Service: ~50 minutes
- Data Source: SQL Database On-premises with Data Gateway
- License: Premium (P1 capacity)
Criteria: Premium License P1 Selected
- Number of users to be scalable from 300 – 1,000
- Initial pbix file size is greater than 1 GB limitation
- Refresh time in Power BI Service: ~50 minutes and expected to exceed over an hour due to increase in data volume
There is one more element to consider when it comes to data refresh as described in the list above: it is a time window to complete refresh. While data is refreshed in Power BI Service, a session is open and the session times out in one hour for Pro license and in 4 hours for Premium license. Therefore, we must consider refresh time duration as well as pbix file size limitation. The refresh time duration is associated to pbix file size and can be improved by data modeling in Power BI, network environment, data source connection, gateway configuration, etc.
The question we must ask is how we can reduce the size of pbix files when designing Power BI reports. We can change data connection from Import mode to DirectQuery mode so that the file size would be reduced dramatically. However, there are some downsides. Performance is affected when dealing with a large set of records – e.g., 12 million records in a dataset. A slow performance appears clearly when filtering and highlighting data in chart visuals with DirectQuery mode. Import mode allows users a great filter experience with 3 – 5 seconds per click. In our project example, we experienced 10 – 30 seconds responses depending on charts with DirectQuery, which decreased a sense of dynamic user experiences. There is an option to create aggregation tables and balance the performance and file size. In our example, it worked well; yet when we needed drilldown data, we had the same performance issue with DirectQuery mode. In order to keep the best performance, import mode is a best option and we created aggregation and transaction tables/views in SQL database and used as data sources. It did not reduce the file size compared to DirectQuery mode, but it retained the performance and great user experiences.
The remaining challenge is that if we choose import mode for best performance, the pbix file size and refresh time both increase as data volume increases in data sources. The pbix file size would reach a threshold of the Power BI license SKUs and increase the running cost over time.
The conclusion is that pbix file size tells us license options (cost) and dictate the way we need to design data sources and data models in Power BI to minimize the file size and optimize report and data refresh performances. The following list is a quick reference to license options and data refresh performance based on pbix file limitations. Hopefully, this limitation will be improved in upcoming Power BI releases.
- Pbix file size is less than 1GB
- Pro License
- Design Power BI reports for refresh to complete within 1 hour
- Pbix file size is greater than 1GB
- Premium License
- Design Power BI reports for refresh to complete within 4 hours
That’s a lot of information, but hopefully we’ve illustrated how file size, licenses, and refresh times are all interrelated and must be considered when planning a Power BI project.
Happy Power BI-ing!