Working with Averages in DAX using AVERAGEX, SUMMARIZE and DATESINPERIOD

In this blog post, we are going to dive into the world of averages and how to solve problems when calculating averages by using CALCULATE, SUMMARIZE and AVERAGEX. Specifically, we look at creating an average calculation across months and as you will see in the video below, this can be a challenge on multiple levels!

Working with SUMMARIZE and SUMMARIZECOLUMNS in DAX

Our goal in this video is to return the average sales for the last 3 months, see Figure-1 below.

SNAGHTML9292404

Figure 1 – Average of last 3 months of sales

Our first attempt at solving this problem may be to just use the expression AVERAGE([Sales Amount]) as seen in Figure-2 below. However, the AVERAGE function will first return the AVERAGE of all the transactions for the month and therefore the results of the following calculation are not quite as expected:

image

Figure 2 – Average of Sales Amount, incorrect calculation.

The result of the expression in Figure 2 can be seen in the following screenshot. Ultimately, average is returning the SUM of the Sales Amount divided by the number of transactions for each month.

SNAGHTML92fc901

Figure 3– Average of Sales Amount

Now that we understand this behavior and what is occurring, we can take a direct approach at solving the problem. First, we need to create a virtual table that contains the months along with their total sales. In order to create this aggregated table we can use SUMMARIZE or SUMMARIZECOLUMNS. In this scenario I will use SUMMARIZE because SUMMARIZECOLUMNS doesn’t quite work correctly within a modified filter context and we will need to change the default filter context to get 3 months in our aggregate table.

The following expression will return an aggregate table with the Year, Month and Total Sales.

image

Figure 4 – Creating a virtual, aggregated table with SUMMARIZE

SUMMARIZE is a function that will return a table expression and therefore you would get an error message if you tried to just put this expression in a calculated measure. However, you could test this out or “DEBUG” this expression by putting it in a calculated table or running the expression from Dax Studio. See the results of this expression from Dax Studio below in Figure 5:

image

Figure 5 – A view of the virtualized table created by SUMMARIZE

Working with AVERAGEX, CALCULATE and DATESINPERIOD

There are a few challenges still left to solve to get our desired result. First, you can’t use the AVERAGE function on a table or table expression, you must provide a column and therefore the result produced by SUMMARIZE here cannot be used by AVERAGE. However, X Functions like AVERAGEX accept a table or table expression as their first parameter and therefore work perfectly with this virtual table.

The next challenge is that the virtual table is returned within the current filter context and therefore we would always be returning the AVERAGE of the current month and only the current month. UNLESS, we use the CALCULATE function to evaluate the expression within a modified filter context, which is exactly what we would do! The calculation in Figure 6 below is the expression that returns the desired result seen in Figure 1.

SNAGHTML94d9dd9

Figure 6 – Final calculation

Managed Virtual Networks and Private Endpoints in Azure Synapse and Azure Data Factory

Hey everyone, back in October I did a 3 hour live stream on YouTube for introduction to Azure. A big part of that 3 hours focused on Azure Data Factory. In this post, I am responding to one of the questions that I received during that live stream with a blog accompanied with a YouTube video.

Is there a way to create a secure connection between Azure Data Factory and Azure SQL DB?

Check out my YouTube video showing how to set up Managed Virtual Networks and Private Endpoints:

Let’s first take a look at the two methods I discussed in the live stream. I showed how to add the IP address of the Azure VM that was making the connection from Azure Data Factory. The method of using the IP address is problematic because the IP address is not static and will change. Therefore, adding the IP Address is a not a permanent fix. The second method I showed was that you could turn on Allow Azure Services. This will work, but….. many companies consider this a bit of a security risk.

When Allow All Azure services is enabled, any Azure Resource can try to authenticate to your Azure SQL DB and that’s a problem for many organizations.

Managed Virtual Network (V-Net) connections and Private End Points in Azure Data Factory

Creating private end points to all your services in Azure is recommended as a best practice and therefore we will be covering the necessary steps here.

image

Creating a secure connection between your Azure services is a 3 step process.

  1. Create an Azure Integration runtime and enable Virtual Network Configuration
  2. Create a Managed Private Endpoint to the Azure Service (Azure SQL DB, Azure Storage, ect..)
  3. Approve the private endpoint request through the Private Link Center

Azure Integration Runtime with Managed VNET in ADF and Synapse

Integration runtimes are the compute that is used to move resources. You are billed based on the amount of Data Integration Units (DIUs) that are used during the data movement process. To securely move your data in a managed virtual network, you first need to make sure that your Azure Integration runtime is created within a managed virtual network. This can be configured when you are provisioning the Data Factory for the first time or later from manage tab.

Note: At the time of this writing/video, Azure Synapse workspaces require that you configure this property when you are provisioning the Synapse resource. If you create your Synapse workspace and you do not enable virtual network configuration, you will not be able to enable it after the fact. Here is a screenshot from the Microsoft documentation on this:

Here are the steps to create an Integration Runtime within a Managed Virtual Network.

  1. Select the manage tab in Data Factory / Synapse
  2. Click on +New
  3. Select Azure when prompted.
  4. On the next screen, name your Integration Runtime and enable Virtual Network Configuration
  5. Click Create.

image

image

How to create Managed Private Endpoints

Once the Integration Runtime with the Managed Virtual Network has been created, you need to create managed private endpoints. Your private endpoint is a private IP address connecting your ADF and Synapse pipelines to a specific resource. Therefore you will create a private endpoint for each data store (Blob, ADLS, Azure SQL DB) that you wish to securely connect to.

To create a managed private endpoint in Azure Data Factory and Synapse, go to your Manage hub, then click on Managed private endpoints, then click New. Keep in mind, this will be disabled and not available until after you have created the Integration Runtime with the managed virtual network.

image

Next, choose the resource in Azure that you want to connect to.

Azure Private Link Center and Approving Private Endpoints

Once the private endpoint has been created it will be in a “Pending” state. This will need to be approved. You can approved a private endpoint from the specific resource or you can go to the Azure Private Link center.

In Azure, search for Private Link and then select Private Link from the list of services returned.

image

Once in the Private Link Center go to Pending Connections, from here, you can approve, reject or remove any connections that may be pending. In my screenshot I don’t have any pending connections because I approved them in the video!

image

Wrapping it up

If you’re like me, networking is a tough topic, I come from a background of writing code, developing solutions and performance tuning. In on-prem development scenarios I let the specialist handle things like networking. With Azure the developer can branch out and learn new things! I hope you enjoyed this blog / video series. Thanks for reading!

Dynamically changing title names in Power BI

Hey Everyone! I’m excited that I’m finally getting this blog done! Also, I’ve embedded the YouTube video below Smile

Also, in other news, after 8 years of blogging I finally decided to make some kind of logo, it’s a little unique but I like it! Especially how I was able to sneak the board game in there, because I love board games!

Original on Transparent

Figure 1 – New logo

Here is the video posted on YouTube if you want to watch the steps in action.

What is a Dynamic Title in Power BI?

The idea around a dynamic title is that the name of the title changes automatically to reflect any filters that may be applied. In the automated gif below, watch how the title name changes as I drill down into the chart visual.

DynamicTitles

Figure 2 – Dynamic title, animated gif

Creating Dynamic Titles in Power BI with SELECTEDVALUE() Function

Creating the dynamic title you saw in Figure 2 above requires a little DAX, not much. First, create a new calculated measure and give it a name, in the YouTube video I called this measure Dynamic Title. Then I used the following expression:

image

This is a simple expression which will return the name of the country if only one country exist in the filter context, if not, it will return the text “All Countries”. In this example I am navigating a hierarchy, therefore, I want the title name to change for each level of the hierarchy as new filters are being applied. The following screenshot is the completed code from the video:

image

Conditional Formatting in Power BI on the Title of a visual

The final step is to use the calculated measure on the visualization with the following steps, see animated gif for reference:

  1. Select visual and navigate to the formatting pane
  2. Find the title category and expand it.
  3. Click on the fx icon next to title
  4. Select field value for Format By
  5. Select your measure from the drop down list for Based on Field.
  6. That’s it!

As always, hope this post helped!

2_Conditional Formatting


Working with Parameters and Expressions in Azure Data Factory

Hey All! Just in time for the holidays I’m uploading another Azure Data Factory video to YouTube. In this video we specifically look at how to use Parameters in Azure Data Factory to make your datasets and pipelines dynamic and reusable! In addition to parameters and expressions we also take a look at the Lookup, For Each and Execute Pipeline activities. This video is very informative, touches on a lot of different pieces and I hope you enjoy it!

Parent / Child Design Pattern in ADF

The Parent / Child design pattern is a popular design pattern for ETL processes and has been for many many years. This gives you that compartmentalization (if that’s a word) of your code making it more reusable and also giving you the ability to easily scale up and scale down by increasing or decreasing parallelization of your workers.

In this scenario the Parent pipeline will determine what work needs to be done and then pass that work to the worker pipeline. In my video I show how we use the Lookup activity in Azure Data Factory to get the list of items to process and then the list can be parsed out.

Azure Data Factory Lookup and Foreach activity

image

This scenario is retrieving the work from an Azure SQL Database, therefore, I use the lookup activity to retrieve the work. However, keep in mind you could use many different activities to serve this purpose, for example, I could use the metadata activity to get the list of files in a folder that needed to be processed. See my video on the metadata activity here:

https://www.youtube.com/watch?v=zm7ybXmUZV0

The For Each activity is used to parse through the list / array that was created by the lookup activity. The For Each activity will perform a set of activities, defined by you, for each item that is returned in the list or array provided.

In this example, we are simply executing the worker pipeline passing in the current schema and table name of the table we want to process!

Execute Pipeline activity in ADF

The child pipeline or worker pipeline is where all the magic happens. In development, we make the datasets dynamic by parameterizing their connection so that the connection can be changed at run time based on the parameters passed in from the parent to to the child. In the following screenshot you can observe that the dataset is connecting to an Azure SQL Database and the schema and table name are determined by parameters at run time!

image

For specifics on the setup, orchestration and execution of this design pattern watch the video Smile As always, thanks for reading my blog and have a great week!

Introduction to Wrangling Data Flows in Azure Data Factory

Hello! It’s been a while since I’ve done a video on Azure Data Factory. To get back in the flow of blogging on ADF I will be starting with Data Flows, specifically Wrangling Data Flows.

The video can be seen here:

What are Wrangling Data Flows in Azure Data Factory?

Wrangling Data flows are a method of easily cleaning and transforming data at scale. huh?

Wrangling Data Flows uses the M query language and the UI experience provided by the Power Query Editor in Power BI Desktop. This is a brilliant move by Microsoft to include this technology in Azure Data Factory. Just think of the hundreds of millions of people who currently are transforming and cleaning their data in Excel or Power BI Desktop. Now they can take their self service ETL (extract, transform and load) skills to the enterprise level with ADF.

What makes it scalable? Power Query Editor at Scale.

Wrangling data flows allows the developer to use the graphical user interface to do all the hard work with minimal to no code. But in the background all of your UI steps are being converted to the M language. At runtime, Azure Data Factory will take that M code and convert it to Spark and then run your data flow against big data clusters. This means as your data volumes grow, you should experience consistent performance!

Are there any limitations with Wrangling Data Flows?

Yes… quite a few actually. Wrangling Data Flows are still in preview at the time of this blog and the related YouTube video. Currently there are quite a few operations that just aren’t supported. The most obvious of those operations being promoting header rows and pivoting data. I hope that these features will be available once the product is in GA.

https://docs.microsoft.com/en-us/azure/data-factory/wrangling-data-flow-functions#known-unsupported-functions9

As always, thank you for reading my blog and watching my YouTube videos! Have a great day!!

Other Azure Data Factory resources!


ALL vs ALLSELECTED in DAX and Power BI Desktop

Hey all! In my latest YouTube video  on DAX I discuss how ALLSELECTED can be leveraged in Data Analysis Expressions (DAX). This continues a series of YouTube videos in which I discuss various DAX functions and how to use them in Power BI Desktop.

Please feel free to check out my YouTube channel and subscribe Winking smile at https://www.YouTube.com/c/MitchellPearson

If you want to watch the video of this function explained, please take a look below:

ALLSELECTED function in DAX and Power BI Desktop

First, let’s look at the definition of ALLSELECTED, provided inside Power BI desktop when referencing the function:

image

ALLSELECTED: Returns all the rows in a table, or all the values in a column, ignoring any filters that may have been applied inside the query, but keeping filters that come from the outside.

What does this mean? Well, it means that it will essentially ignore filters coming from that visual but still respect filters coming from slicers and filters outside that visual (the outer query)! And yes, this is pretty awesome as you will soon see.

ALL vs. ALLSELECTED DAX functions

The biggest difference between ALL and ALLSELECTED is that the ALL function ignores all filters, regardless of where they are coming from. In contrast, the ALLSELECTED function only ignores filters that are coming from the inner query.

Returning Total Sales of all Selected Countries

The following expression in DAX would return the total sales for all countries in all scenarios. For example, the countries Australia, Canada and France were selected in the slicer, but yet the measure Total Sales All Countries still returns the total sales across all countries:

image

Figure 1.1

In this scenario, you might want to return Total Sales for all countries selected in the slicer (allselected), $13,682,863.16. This can be quite tricky, especially if you didn’t know of the existence of the ALLSELECTED function. Let’s write the expression in Figure 1.1 a different way and look at the results.
image

Figure 1.2

In this example you can clearly observe that ALLSELECTED is providing different way to analyze your data. I love this dynamic approach to analyzing data that is provided by simply modifying the ALL function to use ALLSELECTED!

As always, I hope this blog has helped someone somewhere and thanks for reading my blog!!

Azure Data Factory–Rule Based Mapping and This($$) Function

Hello! This is the eight video in a series of videos that will be posted on Azure Data Factory! Feel free to follow this series and other videos I post on YouTube! Remember to  like, subscribe and encourage me to keep posting new videos! Smile

Schema flexibility and late schema binding really separates Azure Data Factory from its’ on-prem rival SQL Server Integration Services (SSIS). This video focuses on leveraging the capability of flexible schemas and how rules can be defined to map changing column names to the sink.

Rule Based Mapping

Rule based mapping in ADF allows you to define rules where you can map columns that come into a data flow to a specific column. For example, you can map a column that has ‘date’ anywhere in the name to a column named ‘Order_Date’. This ability to define rules based allows for very flexible and reusable data flows, in the video below I walk through and explain how to set this up in side of a Select transform, enjoy!

This ( $$ ) Function in a Derived transform and a Select Transform

The this ($$) function simply returns the name of the column or value of the column depending on where it is used. In this video I show two use cases, one in a Select transform and one in a Derived transform.

Video Below:

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training

Azure Data Factory–Executing an ADF Pipeline from Azure Logic Apps

Hello! This is the seventh video in a series of videos that will be posted on Azure Data Factory! Feel free to follow this series and other videos I post on YouTube! Remember to  like, subscribe and encourage me to keep posting new videos! Smile

This video in the series highlights Azure Data Factory integration with Azure Logic Apps!

Azure Logic Apps

Azure Logic apps is a great way of extending the capability of different services in Azure. In this video I take a look at how we can use Azure Logic Apps to perform a wide array of event based triggers for a Data Factory Pipeline

Azure Logic Apps – Create a pipeline run (Executing a Data Factory Pipeline)

We are going to execute a Data Factory pipeline run using the action “Create a pipeline run” in Azure Logic Apps. The biggest challenge here is understanding and learning how to pass parameters into your data factory pipeline! I show this in the video but I will also provide the code snippet here for reference. This can be modified as necessary! Enjoy.

image

Video Below:

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training

Azure Data Factory–Web Activity / Azure Logic Apps

Hello! This is the fifth video in a series of videos that will be posted on Azure Data Factory! Feel free to follow this series and other videos I post on YouTube! Remember to  like, subscribe and encourage me to keep posting new videos! Smile

This video in the series highlights Azure Data Factory integration with Azure Logic Apps!

Web Activity in ADF v2

The web activity within Azure Data Factory allows you to call a custom REST endpoint from an ADF control flow. In this video we make a POST API Method call to Azure Logic Apps.

Azure Logic Apps

Azure Logic Apps is a great automated tool for building automated workflows and integrates really well with Azure Data Factory!

Video Below:

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training

Azure Data Factory–Copy and Delete Activities

Hello! This is the fifth video in a series of videos that will be posted on Azure Data Factory! Feel free to follow this series and other videos I post on YouTube! Remember to  like, subscribe and encourage me to keep posting new videos! Smile

This video in the series leverages the combination of copy and delete activities to archive files once they have been processed.

Copy Activity in ADF v2

The copy activity within Azure Data Factory allows you to efficiently move data from a source to a destination. The copy activity is highly customizable making reusability extremely easy through parameterization. In this video we will copy a file from one blob container to another.

Delete Activity in Azure Data Factory

The delete activity will allow you to delete files or folders either in an on-prem environment or in a cloud environment. In this demo we first move the file using the copy activity and then delete the file from the source with the delete activity!

Video Below:

 

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training