Fortune Telling Collection - Zodiac Guide - Star model and snowflake model of data warehouse

Star model and snowflake model of data warehouse

Recently, I was asked about the star model and snowflake model in an interview, and then I said that the constellation model is a snowflake model ... In order to consolidate this knowledge, I decided to write an article to record it.

Star model, snowflake model and constellation model are three important models in data warehouse dimension modeling. Next, let's talk about their characteristics and their relationship.

The star model consists of a fact table and several dimension tables. The fact table includes all primary keys (generally id) of the dimension table, and other contents that are not put into the dimension table; The dimension table stores the detailed information of the corresponding dimension.

Take a purchase table as an example. It mainly needs to record the following information:

Because users, goods and stores have their own detailed information, if they are all put in the purchase table, it will cause great redundancy and it will be difficult to maintain later. (Imagine if the user needs to add fields later) So at this time, you can choose the star model to put these details in the dimension table, while the purchase table as the fact table only keeps the user id, product id, store id and purchase time. (The purchase time cannot be divided into dimensions, so it is still in the fact table. )

The overall model structure is shown in the following figure:

You can see that there is a fact table and three dimension tables, users, goods and stores. When all dimension tables are connected with fact tables, the whole model is shaped like a star, so it is called a star model.

In the star model, the dimension table contains all the information of the dimension. Because there is no hierarchy, there may be redundancy in the dimension table.

In order to reduce the redundancy of dimension table, snowflake model can be used at this time. Based on the star model, Snowflake model further splits some fields in the dimension table, reducing redundancy and making it more hierarchical.

Take the previous purchase table as an example. Suppose there are several fields in the store table that store the location information of the store: {province, city, specific location}. At this time, we can see that these fields can actually belong to the attribute of "province", so we can split these fields to form a new dimension table "province". This dimension table is connected with the store dimension table instead of the fact table itself. From a certain point of view, the process of splitting dimension table by snowflake model is somewhat similar to that of splitting fact table by star model.

For the purchase table, the split snowflake model is as follows:

Here, you can see there is a new dimension table. When there are many dimension tables, we can see that the whole model diagram will expand like snowflakes, so this model is called snowflake model.

Constellation model is an extension of star model (it can be regarded as a star model with multiple versions of fact tables). One of its characteristics is that multiple fact tables share the dimension table in the model, which is suitable for more complicated occasions than star model and snowflake model.

One of the reasons why I confused the constellation model with the snowflake model before was that they were somewhat similar in shape. From a graphical point of view, each fact table can be regarded as a star. If there are multiple stars in a constellation, then there need to be multiple fact tables. The snowflake model can be understood as a shape (fact table) that expands outward from the center of the snowflake. If you remember them like this, you won't mix them up again.