BusinessObjects, Web Intelligence , Crystal Reports

Reminder: Who manages RI in a data mart?

July 19th, 2010 by Kevin McManus Leave a reply »

Reviewed the data model and loading today with the DBA.  To explain the reason for not enforcing RI through foreign key constraints there are few concepts.(see page 536 of the toolkit book mentioned below for more explanation)

1) This is a read only data mart not an operational system so there are single threaded ETL jobs and read only queries.  The ETL process handles the RI by loading the Dimensions and Facts as part of a single correlated load stream.  This means there are not outside applications adding and removing rows so RI can be maintained by the ETL application logic.

2) We are not rejecting fact records based on their keys.  If keys don’t match an appropriate value in a dimension, then the value is added to the dimension (e.g. new title is added)

3) We are not rejecting dimension records based on their keys.  If keys don’t match the lookup attribute table then the Unknown key is used (i.e. -1)

Example: So if a series of  web sites that have an invalid country code we are still going to load web site attributes into the sites dimension tables,  but we will insert the key for an “unknown” country into the country foreign key.

4) We needed to first load a significant amount of data in order to validate that the data model was going to match the proposed model.  We can always add foreign key constraints but since items 2 and 3 handle valid and invalid keys programmatically they constraints would not come into play.

From Oracle: “Unlike many relational-database environments, data in a data warehouse is typically added and/or modified under very controlled circumstances during the ETT process. Multiple users typically do not update the data warehouse directly; this is considerably different from the usage of a typical operational system. Thus, the specific usage of constraints in a data warehouse may vary considerably from the usage of constraints in operational systems”

Recommended reading:

Data Modeling Essentials -Graeme Simsion – ISBN 1-850-32877-3

The Data Warehouse Lifecycle Toolkit – Ralph Kimball (and others) – ISBN 0-471-25547-5

Advertisement

Comments are closed.