Fortune Telling Collection - Zodiac Analysis - Data warehouse modeling, star model is roughly understood, that is, fact tables correspond to many dimension tables; I don't quite understand the snowflake model
Data warehouse modeling, star model is roughly understood, that is, fact tables correspond to many dimension tables; I don't quite understand the snowflake model
Multi-dimensional data modeling of star mode vs snowflake mode organizes data in an intuitive way and supports high-performance data access. Each multidimensional data model is represented by multiple multidimensional data patterns, and each multidimensional data pattern consists of a fact table and a set of dimension tables. The most common multidimensional model is the star pattern. In the star mode, the fact table is centered, and multiple dimension tables are radially distributed around it and connected with the fact table. On the basis of stars, snowflake patterns are developed. Let's compare their characteristics. The entity with the star pattern in the center of the star is the index entity, the basic entity that users are most concerned about, and the center of query activities, which provides quantitative data for the query activities of the data warehouse. Each indicator entity represents a series of related facts and performs a specific function. The entity located in the star corner of the star map is a dimensional entity, whose function is to limit the user's query results and filter the data, so that the number of rows returned by the index entity query is less, thus narrowing the access scope. Each dimension table has its own attributes, and the dimension table and the fact table are related by keywords. Although the star pattern is a relational model, it is not a standardized model. In star mode, the dimension table is denormalized intentionally, which is the basic difference between star mode and relational mode in OLTP system. There are two main reasons for using star schema: improving query efficiency. The advantage of the data warehouse designed with star mode is that the data organization has been preprocessed, and the main data is in a huge fact table, so you can query by scanning the fact table without connecting multiple huge tables, and the query access efficiency is high. At the same time, because the dimension table is generally small, it can even be placed in the cache, and it is faster to connect with the fact table; Easy for users to understand. For non-computer professional users, the star schema is intuitive, and it is easy to combine various queries by analyzing the star schema. Summary: non-normalization; Each dimension in the cube is connected with the fact table (through primary key and foreign key); There is no gradient dimension; Redundant data exists; The query efficiency may be higher; The design and maintenance are relatively simple without too much consideration of normalization factors.
In practical application, with the increase and change of fact table and dimension table, star model will produce many derivative models, including galaxy model, constellation model, two-dimensional dimension table, snowflake model and so on. Snowflake mode is a further layering of star mode dimension tables, which expands some dimension tables into fact tables, so that it can not only deal with the queries of users at different levels, but also integrate the source data upward through the links between levels, so as to minimize the data storage and improve the query function. The dimension table of snowflake pattern is based on the normal form theory, so it is a design pattern between the third normal form and the star pattern. Usually, some data organizations adopt the normative structure of the third paradigm, while others adopt the fact table and dimension table structure of the star schema. In some cases, the formation of snowflake pattern is due to the standardization of data table in order to reduce the hierarchical structure of dimension table and deal with many-to-many relationship when organizing data in star pattern. The advantages of snowflake mode are: reducing storage space to some extent; Normalized structures are easier to update and maintain. Similarly, the snowflake mode also has many shortcomings: the snowflake mode is complex and difficult for users to understand; Browsing content is relatively difficult; Additional connections will reduce query performance. In a data warehouse, it is generally not recommended to use "snowflake". Because in data warehouse, query performance is more valued than OLTP system, snowflake mode will reduce the performance of data warehouse system. Summary: normalization; Less data redundancy; Some data need to be connected to obtain, which may be inefficient; The standardization operation is complicated, which leads to the complexity of design and later maintenance; In practical application, the above two models can be mixed: for example, the middle layer adopts snowflake structure to reduce data redundancy, and the data mart part adopts star shape to facilitate data extraction and analysis.
Sometimes standardization and efficiency are contradictory. Generally, we will sacrifice space (normalization) for good performance, and it is the fastest to store as much dimensional information as possible in a "big table". Usually take a compromise strategy according to the situation.
Stars sometimes cause a lot of data redundancy, and it is very likely that the fact table will become extremely bloated (millions of data × hundreds of dimensions).
Every time a dimension member needs to be updated, the fact table must also be updated.
Snowflake types sometimes only need to update one layer in the snowflake dimension without changing the huge fact table.
Specific analysis of specific problems, such as time dimension, year and season, does not need to do snowflakes, but involves the classification of products and products. If the classification information is also the information we need to analyze, then I will definitely set up a look-up table about classification, that is, using snowflake mode.
Snowflake structure is a standardized structure that removes redundant data from data warehouse. For example, there is a sales fact table, then a product dimension table is connected to it, and then a product category dimension table is connected to the product dimension table. This structure is the snowflake structure. In addition to data redundancy, snowflake structure also needs to be connected to generate some statistics, so the efficiency is not necessarily as high as that of star structure. Normalization is also a complicated process, and the corresponding database structure design, data ETL and post-maintenance are complicated.
The star pattern is an informal structure. Every dimension in the cube is connected with the fact table, and there is no gradient dimension, so the data is redundant. Because of the redundancy of data, many statistical queries do not need external connection, so the efficiency is generally higher than that of snowflake. Star structure does not need to consider many normalization factors, and its design and implementation are relatively simple.
Although there are some differences between the two structures, I personally think there is no difference between good and bad. The most important thing is to look at the requirements and business logic of the project.
- Related articles
- The stars of Libra
- Free 12 constellation stick figure software.
- How about the high mountain and flowing water constellation? OK or not? Is it worth buying?
- What constellation is bra _ Libra _ Libra?
- Capricorn fatalism constellation
- In late May, I saw no hope, so I stopped thinking. What are the constellations that love is out of reach?
- Female heart signal pairing constellation
- What is the game of Poké mon to subdue human beings?
- 1982 what is the constellation of the tenth day of the eighth lunar month?
- What constellation do you like to haggle over?