Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article at OpenGenus, we will learn about the concept of a Data Mart. A data mart is a simplified or condensed version of a data warehouse. It has faster data retrieval and provides data required by a specific department or line of business within an organization.
Table of contents:
- What is a Data Mart?
- What does a Data Mart do?
- Data Marts vs Data Warehouses
- Data Mart Structures (Schemas)
- Types of Data Marts
- Benefits of Data Marts
- Disadvantages of Data Marts
- Data Marts Use Cases
What is a Data Mart?
A data mart is a subset of a data warehouse. It offers better querying speed for analysts as it naturally contains fewer data. It is an access pattern specific to data warehouse environments to retrieve client-facing data. Additionally, data mart sources can include internal operational systems, external data, and a central data warehouse.
What does a Data Mart do?
Unlike transactional databases, for example PostgreSQL and MySQL, that are designed to be updated, data marts and data warehouses are read-only.
Data marts are efficient and cost-effective as a centralized place to store organized data before reports, dashboards, and visualizations are created.
A data mart provides easy access to data oriented towards a single department or line of business, such as sales, finance, or marketing. Thus, a data mart is committed to a specific business function. In addition, multiple data marts could be used to derive specific information for a business unit.
Data Mart vs Data Warehouse
Data Mart | Data Warehouse |
---|---|
A single subject or functional organization area | Enterprise-wide repository of disparate data sources |
No preference between a normalized and denormalized structure |
Denormalized for quicker data querying and read performance |
Typically summarized data | Raw data, metadata, and summary data |
Integrates specific data relating to the functional area |
Integrates data from multiple sources |
Smaller data storage (< 100GB) | Larger data storage (> 100GB) |
Analyzes smaller datasets for a particular subject to support analytics and BI |
Analyzes large complex enterprise-wide datasets to support data mining, BI, AI and machine learning, etc |
Built focused on dimensional model using schemas like star and snowflake |
Feeds dimensional models |
Easier data governance since data is partitioned |
Requires strict governance rules and systems to access data |
Cost >= $10K | Cost >= 100K, much less if on cloud |
3~6 mo of setup time | >= 1 year for on-premises warehouses, much less time if on cloud |
For tactical decision-making | For strategic decision-making |
Types of Data Mart
There are 3 main types of data marts:
1. Dependent Data Marts
A dependent data mart is created by extracting a subset of data from a data warehouse. Next, the data is structured that is optimized for the needs of a specific department or business unit. It is a top-down structure that all the enterprise data is stored in a central location.
Any changes made to the data warehouse automatically reflects in its dependent data mart. Furthermore, dependent data marts are usually built using the same architecture as the data warehouse and may use the same data modeling techniques, such as star schema or snowflake schema.
2. Independent Data Marts
An independent data mart is built and updated separately from data warehouses. Data is extracted from internal or external sources and loaded to the independent data mart. It's usually constructed utilizing dimensional modeling techniques.
3. Hybrid Data Marts
A hybrid data mart is a combination of dependent and independent data marts. It uses data from data warehouses and external data sources. When data warehouses alone can't fulfill specific data needs, a hybrid data mart is applied.
Data Mart Schemas (Structures)
A data mart structure (schema) is a subject-oriented relational database that stores data in tables. It represents how data relates logically within a data warehouse.
Data marts are structured in a multi-dimensional schema that defines rules for how data warehouses manage the names, descriptions, associated data items, and aggregates within a data warehouse. They works as blueprints for data analysis. Star and Snowflake schemas are types of warehouse schema that describe different logical arrangements of data.
There are 3 popular design schemas for data marts:
1. Star Schema
A star schema has the name, "star," as its physical model resembles to the star shape. It has a fact table at its center and the dimension tables at its peripheral representing the starβs points.
A star schema has a centralized data repository stored in a fact table. The fact table contains aggregated data to be used for reporting purposes, and the dimension table describes the stored data.
The fact table is split into denormalized dimension tables. Each dimension table is joined to the fact table through a foreign key. In addition, a star schema can include one or more fact tables indexing any number of dimensional tables.
A star schema has the simplest structure that allows for fast query response. Also, it's scalable and flexible due to the fact that it can be extended by adding new dimension tables or measures to the fact table.
2. Snowflake Schema
A snowflake schema has the name, "snowflake," because its tables and their relationships resemble a diagram of a snowflake. A snowflake schema is a variant of the star schema, but allows for more intricate hierarchies.
The fact table is located at the center surrounded by the dimension tables. However, each dimension table is further broken down into multiple related tables, creating a hierarchical structure that resembles a snowflake.
The main difference between a star schema and a snowflake schema is that the dimension tables in a snowflake schema are normalized into multiple related tables to reduce redundancy. These tables are joined by a foreign key.
3. Data Vault
A data vault is an innovative approach that's suitable for large-scaled data warehouse platforms. In a data vault, the data is broken down into three types of tables:
1. Hubs
Hubs represent the business entities.
2. Links
Links represent the relationships between entities.
3. Satellites
Satellites contain additional information about the entities and their relationships.
Because of how a data vault is broken down, its design is more complex yet flexible that can accommodate large amounts of data and changes in the data over time, such as shifting business needs.
Furthermore, it addresses the drawbacks of the normalized (3rd normal form), and dimensional modeling techniques such as star and snowflake schemas mentioned above. In addition, it combines the centralized raw data repository of the Inmon approach with the incremental build advantages of Kimball (refer to "Inmon vs. Kimball" section if you want to read more about it.)
Benefits of Data Mart
A data mart dedicated to a department or specific line of business offers several benefits:
1. A single source of truth
Since the data is centralized, a department or organization make analysis based on the same data.
2. Improves end-user response time
Its data is subsets of data from the data warehouse. So it's faster accessible and combining it with data from other sources without the need of IT to obtain periodic extracts.
3. Faster insights leading to faster decision making
A data mart allows data analytics at the department level due to the fact that it extracts a subset of data from the data warehouse. Analysis are focused more easily in a specific area to make faster decisions comparing to enterprise-wide data in data warehouse.
4. More flexible to change
Due to its smaller size compared to a data warehouse, it's more flexible and agile when changes are needed.
5. Faster implementation and less costly
A data mart usually needs a subset of data comparing to data warehouse. Thus it's easier and faster to implement. It also benefits transient types of data analysis, for example, a one-month-long analysis of customer satisfaction rate based on a new policy change.
Disadvantages of Data Mart
-
Data redundancy across multiple data marts causes data inconsistencies and more storage space.
-
Limited scope of data that's hard to make cross-departmental or cross-functional decisions.
-
Integration challenges due to the fact that data marts might be built with different data models.
-
Limited scalability since data marts are not designed to handle large and complex data, for example when a company grows.
Data Mart Use Cases
-
When there is need to access specific data relevant to real-time
-
Small business or projects with smaller data sets benefits from the cost-effectiveness of data marts.
-
Faster implementation and easy data maintenance allow business units to own and maintain their own data
-
Accelerated data access provided by Data marts make it easy to track key performance indicators for Business Intelligence.