Data Warehousing Objects
Fact tables and dimension
tables are the two types of objects commonly used in dimensional data warehouse
schemas.
Fact tables are the large
tables in your data warehouse schema that store business measurements. Fact
tables typically contain facts and foreign keys to the dimension tables. Fact
tables represent data, usually numeric and additive, that can be analyzed and
examined. Examples include Sales, Cost and Profit.
Dimension tables, also
known as lookup or reference tables, contain the relatively static data in the
data warehouse. Dimension tables store the information you normally use to
contain queries. Dimension tables are usually textual and descriptive and you
can use them as the row headers of the result set. Examples are Customers or
Products
Fact Table:
A fact table typically has
two types of columns: those that contain numeric facts (often called measurements),
and those that are foreign keys to dimension tables. A fact table contains
either detail-level facts or facts that have been aggregated. Fact tables that
contain aggregated facts are often called summary tables. A fact table usually
contains facts with the same level of aggregation.
Fact tables are summarized
into three types:
1 Additive
Semi-Additive
Non-Additive
Additive facts can be
aggregated by simple arithmetical addition. A common example of this is sales. Semi-additive
facts can be aggregated along some of the dimensions and not along others. An
example of this is inventory levels, where you cannot tell what a level means
simply by looking at it. Non-additive facts cannot be added at all. An example
of this is averages.
Requirements of Fact Table:
You must define a fact
table for each star schema. From a modeling standpoint, the primary key of the
fact table is usually a composite key that is made up of all of its foreign
keys.
Dimension Table:
A dimension is a
structure, often composed of one or more hierarchies, that categorizes data.
Dimensional attributes help to describe the dimensional value. They are
normally descriptive, textual values. Several distinct dimensions, combined
with facts, enable you to answer business questions. Commonly used dimensions
are customers, products, and time.
Dimension data is
typically collected at the lowest level of detail and then aggregated into
higher level totals that are more useful for analysis. These natural rollups or
aggregations within a dimension table are called hierarchies.
Commonly dimension tables
are categorized into three types:
Conformed
dimension
Degenerated
dimension
Junk
dimension
If we are saying in simple word, Conformed
dimension table is a table which is having at least one or multiple primary key
or unique key to join with one or multiple fact table. For example product
table is related with the sales fact table
A conformed dimension is a set of data
attributes that have been physically implemented in multiple database tables
using the same structure, attributes, domain values, definitions and concepts
in each implementation. A conformed dimension cuts across many facts.
Dimensions are conformed when they are either exactly the
same (including keys) or one is a perfect subset of the other. Most important,
the row headers produced in the answer sets from two different conformed
dimensions must be able to match perfectly.
A junk dimension is a convenient grouping of typically
low-cardinality flags and indicators. By creating an abstract dimension, these
flags and indicators are removed from the fact table while placing them into a
useful dimensional framework.
A Junk Dimension is a dimension table consisting of
attributes that do not belong in the fact table or in any of the existing
dimension tables. The nature of these attributes is usually text or various
flags, e.g. non-generic comments or just simple yes/no or true/false indicators.
These kinds of attributes are typically remaining when all the obvious
dimensions in the business process have been identified and thus the designer
is faced with the challenge of where to put these attributes that do not belong
in the other dimensions.
One
solution is to create a new dimension for each of the remaining attributes, but
due to their nature, it could be necessary to create a vast number of new
dimensions resulting in a fact table with a very large number of foreign keys.
The designer could also decide to leave the remaining attributes in the fact
table but this could make the row length of the table unnecessarily large if,
for example, the attributes is a long text string.
Refer Ralph Kimball’s explanation about Junk
Dimension
Hierarchies
Hierarchies are logical
structures that use ordered levels as a means of organizing data. A hierarchy
can be used to define data aggregation. For example, in a time dimension, a
hierarchy might aggregate data from the month level to the quarter level to the
year level. A hierarchy can also be used to define a navigational drill path
and to establish a family structure.
Within a hierarchy, each
level is logically connected to the levels above and below it. Data values at
lower levels aggregate into the data values at higher levels. A dimension can
be composed of more than one hierarchy. For example, in the product dimension,
there might be two hierarchies—one for product categories and one for product
suppliers.
Dimension hierarchies also
group levels from general to granular. Query tools use hierarchies to enable
you to drill down into your data to view different levels of granularity. This
is one of the key benefits of a data warehouse.
When designing
hierarchies, you must consider the relationships in business structures. For
example, a divisional multilevel sales organization. Hierarchies impose a
family structure on dimension values. For a particular level value, a value at
the next higher level is its parent, and values at the next lower level are its
children. These familial relationships enable analysts to access data quickly.
Levels
A level represents a
position in a hierarchy. For example, a time dimension might have a hierarchy that
represents data at the month, quarter, and year levels. Levels range from
general to specific, with the root level as the highest or most general level.
The levels in a dimension are organized into one or more hierarchies.
Level Relationships
Level relationships
specify top-to-bottom ordering of levels from most general (the root) to most
specific information. They define the parent-child relationship between the
levels in a hierarchy.
Hierarchies are also
essential components in enabling more complex rewrites. For example, the
database can aggregate an existing sales revenue on a quarterly base to a
yearly aggregation when the dimensional dependencies between quarter and year
are known.
Typical Dimension Hierarchy
Wherever we go, we have to know this concepts to work on data warehousing technologies.
Cheers!
Bose
Tuesday, April 24, 2012
//
Labels:
Data Warehousing
//
0
comments
//