Archive for the ‘Data Warehouse’ category

Reminder: Who manages RI in a data mart?

July 19th, 2010

Reviewed the data model and loading today with the DBA.  To explain the reason for not enforcing RI through foreign key constraints there are few concepts.(see page 536 of the toolkit book mentioned below for more explanation)

1) This is a read only data mart not an operational system so there are single threaded ETL jobs and read only queries.  The ETL process handles the RI by loading the Dimensions and Facts as part of a single correlated load stream.  This means there are not outside applications adding and removing rows so RI can be maintained by the ETL application logic.

2) We are not rejecting fact records based on their keys.  If keys don’t match an appropriate value in a dimension, then the value is added to the dimension (e.g. new title is added)

3) We are not rejecting dimension records based on their keys.  If keys don’t match the lookup attribute table then the Unknown key is used (i.e. -1)

Example: So if a series of  web sites that have an invalid country code we are still going to load web site attributes into the sites dimension tables,  but we will insert the key for an “unknown” country into the country foreign key.

4) We needed to first load a significant amount of data in order to validate that the data model was going to match the proposed model.  We can always add foreign key constraints but since items 2 and 3 handle valid and invalid keys programmatically they constraints would not come into play.

From Oracle: “Unlike many relational-database environments, data in a data warehouse is typically added and/or modified under very controlled circumstances during the ETT process. Multiple users typically do not update the data warehouse directly; this is considerably different from the usage of a typical operational system. Thus, the specific usage of constraints in a data warehouse may vary considerably from the usage of constraints in operational systems”

Recommended reading:

Data Modeling Essentials -Graeme Simsion – ISBN 1-850-32877-3

The Data Warehouse Lifecycle Toolkit – Ralph Kimball (and others) – ISBN 0-471-25547-5

BI Tool Selection

February 6th, 2010

When a company thinks that a BI system will benefit their ability to access and analyze data, a great door of opportunity is opened. Through that door will come many vendors and products that may or may not be a match for your company’s needs and abilities. From the beginning, the success of a system is establishing the needs or requirements and the proposed benefits to the company. Without these, every product can be made to look like it fits and is “just what you are looking for.”

One thing to keep in mind is that these requirements are going to change throughout the life cycle of the design process as requirement-gathering processes uncover additional application issues. Benefits can be directly related to monetary issues (such as cost savings though eliminating unsuccessful spending) as well as non- monetary benefits (such as reducing the time it takes to analyze the last month’s sales trends).

How then does a company choose which product is right-one that will match the current needs and also support the requirements of future applications? There are some simple processes that can help in this decision. The first is to break the requirements into their separate entities. These entities will separate the kinds of questions that need to be asked of the products you are considering and identify your expectations of the application.

The primary classifications are simply by data related issues and by application functionality. While other factors exist (such as corporate relationships as well as vendor characteristics) and are influential, these should be viewed as external forces on the decision process. Other external forces, such as staffing requirements, should also be considered.

This topic will be continued…..