Complete Guide to Create Data Models & Table Relationships in Power BI

A data model is essentially a structured representation of your data, which can be used to facilitate analysis and reporting. In Power BI, you can use data models to create interactive reports & visualization that allow you to explore the data in a variety of ways.

Today we will cover the entire data modeling concept in Power BI, including Data Normalization, Fact/Dimension structure, Types of Data Model & Cardinality. We will also discuss topics such as Relationships and Filter directions, and how to use these to create more powerful and sophisticated data models. So let’s get started!

What is Data Normalization

Before we get into data models, it’s crucial to understand Data Normalization because it is a mandatory precursor for creating a data model.

So to explain it in simple terms, it is a process of splitting the database into multiple tables so that we can reduce redundancy and improve data integrity. Let me explain this with an example

Below table contains the data of Pizza sales in Joy Pizza’s from 1st Jan – 5th Jan 2023.

At first glance the data might seem perfect to you for use in your reports or visualization. But if you observe closely, it contains a lot of redundant data which needs to be normalized. And if you ignore this step then when data gets bigger, lets say 10 million records, then it will create a lot of inefficiency & lag. Also working on any kind of analysis on such a huge data set will be a nightmare.

So How to Normalize?

Here comes the concept of splitting the database into Fact & Dimensions tables. Fact tables typically contain measurable or quantitative fields and Dimension tables contain unique categorical columns which are attributes of the Quantitative fields.

To understand this better we will use the same Pizza sales data and will split it into Fact & Dimensions

Fact Table

In this I have taken all measurable values in one table which may contain duplicate entries of any particular column like Date but all row items serves a single purpose of providing sales value of a pizza in a given date

Dimension tables

Now I have splitted all the redundant categorical values in a separate table and which are now unique. Dimension tables should be unique in nature and should also contain a linking column which will be used to connect to the Fact table when we build the data model. In database and modeling terms, this concept is known as Primary & Foreign Key. We will explore these concepts much more in detail going forward.

Let’s start building our Data Model

Now let’s import our Fact & Dimension tables into power bi. (In case you need detailed content on Data Import then you can refer to my previous blog on the same subject)

  • Open Power BI Desktop
  • Go to Get Data >> Excel Workbook
  • Select the file from the browse menu & import Fact table & Dimension tables
  • After importing data, click on the Modeling tab of Power BI

You will see something like below

Sometimes Power BI auto detects relationships between tables & creates connections. So you may choose to keep them or remove them as per your requirements. But for now we will start from scratch so that I can explain each and every concept

Right now these are isolated tables which do not have any kind of connection between them, so even if you use these tables together in report view, you will get an error

Creating Relationships

So now to create relationships in your data model

  • Go back to the Model view
  • Click on the Foreign key i.e. the key in the Fact table which has multiple entries for date / Pizza code and drop it on the Primary key of the Dimension table

Now you have successfully created a 1-Many relationship between Dimension table & Fact table. This is called CARDINALITY, and it is advisable to always create a 1-Many relationship. If you try to create a Many-Many relationship i.e. both tables have multiple entries of the same value then this might create problems. So be very cautious if you decide to use Many-Many relationships, as this might have weird behavior on your model.

Let’s see how this has impacted our Report View grid which we created earlier

As you see now all values are fetching correctly and we have successfully implemented our data model into the reporting view.

Types of Model Schema

So typically while creating a data model, you will come across below 2 model schema’s

STAR SCHEMA

This type of schema is formed like a star and hence the name. There would be a central Fact table which is connected to multiple Dimension Tables. Similar to what we have created above

SNOWFLAKE SCHEMA

Then you also have snowflake which is an extension from a star schema, where each or few dimension tables have more sub attribute tables branching out of it

Both of the SCHEMA have their own purpose and its usage is completely dependent on your requirements.

So now let’s see an example of SNOWFLAKE SCHEMA in our data. So now in addition to the existing dataset, we have an additional subcategory table which has the ingredients of Veg & Non-Veg pizza

We cannot establish a direct relationship between this subcategory dimension table and the fact table, so we will have to rely on a SNOWFLAKE SCHEMA and join Dim_Pizzacode with this table like below:

  • Import the new data into Power BI
  • Go to the Model view and then join Dim_PizzaCode with Dim_SubCategory

Cross Filter Direction

Now the final topic for today is Cross Filter Direction. You must have noticed by now that while creating a relationship between 2 tables, Power BI indicates the relationship flow with a small arrow

This is a very important topic to understand, since this has a direct impact on the visualization that you create in your view. The flow of the filter is always from the Primary Key (dimension table) to Foreign Key (fact table). So how does this matter?

Let us understand this by loading a 2nd Fact table in our model, which is amount of Pizza’s which are Unsold at the end of the day

Now let’s join the table with Dimension tables similar to Fact_PizzaSales

Going back to the Report view, if I drag Date from Sales table instead of the Dimension table then it will give me below result, which is wrong

This is because the flow of the filter is from the Dimension to the Sales Fact table and not the other way round, so when we drag a column from the Sales table, Power BI does not know what to do with the column from the Unsold fact table.

Same goes if we drag the Date column from the Unsold Fact table

Both Way Filter Direction

Now doing is not recommended , since this might have unforeseen behavior in your data model. But just in case you still wanted a 2 way filter here, then below is what you should do

  • Go to Model view
  • Double click on the connection line to go the Edit Relationship prompt
  • Open the dropdown in Cross Filter Direction section and select “Both” & press Ok
  • But if you do this with the “Fact_PizzaUnsold” then the result might seem to be correct but if you look closely, it is missing data. I.e. there is no data for 2nd Jan. This is because PizzaUnsold did not have any value for 2nd Jan, so the final data is automatically filtered

So instead of Fact_PizzaUnsold, apply 2-Way filter on Fact_PizzaSales and you will see the correct results now

But as mentioned above, try to use 1 way filter direction whenever possible to avoid such data discrepancies and confusions.

That’s it for today. It took a lot of effort to prepare this content for you. So I would really appreciate it if you subscribe to my content. Also let me know in the comment section if you would like me to cover any other topic related to Data Model. Thank you & see you in the next one!

Leave a Reply

Your email address will not be published. Required fields are marked *