DW: Introduction to Data Warehouses
In this article, we’re diving into the world of data warehousing. We’ll break down the basics, explore how it all started, and uncover why it’s become such a big deal in the business world.
A data warehouse, often abbreviated as DW, is a system designed for storing, analyzing, and reporting data effectively.
The main goal of a data warehouse is to create an integrated data environment made specifically for analysis, aiding crucial decision-making in businesses.
1. Understanding How DW Began
A data warehouse doesn’t create data itself; it gathers data from various external systems. Similarly, it doesn’t consume data directly; instead, its outcomes are shared with different applications. This is why it’s called a “warehouse” and not a “factory.”
In simple terms: its purpose is analyzing data to support business decisions.
Within a business setting, information usually serves two main purposes: maintaining records and guiding analytical decisions.
For instance, let’s look at China Life Insurance Company, where I previously interned, to understand why data warehousing emerged. They manage several lines of business, like life insurance, property insurance, auto insurance, and pension insurance. Smooth operations across these areas need careful handling of vast amounts of customer information, policy records, financial transactions, and claims data.
Online transaction systems (OLTP) handle these operations well by processing transactions swiftly. These systems process user data quickly, providing prompt results.
Common databases like Oracle, MySQL, and SQL Server are typical OLTP tools.
As the group’s business grows, so does the volume of data. This growth poses challenges:
- Identifying issues in insurance categories.
- Crafting effective policies.
- Detecting potential fraud.
- Ensuring reports cover all business lines.
OLTP systems focus on transactions. All business operations involve reading and writing data. Reading operations usually exert more pressure. However, conducting various analyses directly within an OLTP environment raises concerns:
- Analyzing data means more reading, increasing pressure.
- OLTP systems store data briefly.
- Data is scattered across systems with inconsistent structures.
While analyzing smaller datasets within an OLTP environment might work, building an integrated data analytics platform becomes crucial for larger-scale analysis without affecting OLTP systems.
This platform aims to focus on analysis, supporting analytical tasks without impacting OLTP systems. This marked the beginning of data warehousing in enterprises.
2. Creating a DW
A data warehouse specializes in storing, analyzing, and reporting data, creating an integrated environment for analysis—part of a broader category called OLAP (Online Analytical Processing) systems.
For instance, China Life Insurance Company can create a data warehouse tailored for analytical decision-making.
3. Structure of the DW
ODS Layer (Operation Data Store): This layer stores raw, unprocessed data in the data warehouse system, aligning with the source system’s structure. It prepares data for the warehouse while recording historical changes.
DW Layer (Data Warehouse): Derived from the ODS layer, it manages data processing and integration. It establishes consistent dimensions, creates detailed fact tables for analysis, and consolidates common indicators.
ADS Layer: This layer provides tailored business data for end-users, including reports, charts, KPIs, and data analysis tools.
Benefits of Layering:
- Organized data structure.
- Tracing data sources.
- Reducing repetitive work.
- Simplifying tasks for better maintenance.
- Shielding original data from issues.
Thank you for reading!