Archive for the ‘Data Warehouse’ category

Are people more important than technology?

November 13th, 2010

During a recent project review of an upcoming Enterprise BI / DW platform, one of the project tasks’s was to do an end to end tool selection. However, beyond technology the other aspects of a successful BI program necessary including people, process and data (ownership) were not properly being assessed.  The needs to properly assess and manage these areas would outlast any particular phase/sprint of development and would have an even greater impact than the technology.  With a brand new large BI program there will need to be new processes and new teams created in IT to make the current software in order to be effective in this program (note: projects end, programs evolve and change with the business, so BI/DW programs should be built and staffed that way from day 1)

What was missing in the evaluation is the desired impact in the IT culture.  All of the solutions involved in the evaluation would demand a high level of experience, technical knowledge and creativity during the design and development in order to make it a success.  While the technical evaluations were sound, I wanted to see that there had been consideration of other non-technical aspects of a Data warehouse necessary to make the program and systems a success.

Questions arose that needed to be answered

– Is the expectation that business users will be connecting with Vendor support directly?

– who is responsible for “quality” resource acquisition

– Who is responsible for help desk models to support Business Rules, Data Miniing , etc

– Who is responsible for ensuring that the quality of the data is kept intact over time from a technical architecture as the business change over time?

– Who is responsible for ensuring that the knowledge of business rules are kept as assets?  An enterprise data warehouse/ BI system is a huge investment and that investment should be protected as an asset of the company.

No matter what tool is chosen there are huge drains to a project, in time and quality, when users need to make business rule and solution design decisions on the fly.  This includes the creation of new business metrics, new reporting processes and mini-tool selections as new requests come up. (i.e. the users uses the excel export so they can recalculate a metric because the data warehouse team takes 3 weeks to get a change in place).

One of the best practices to balance the projects technical needs with the people and processes is to establish a Business Intelligence competency group at the outset of the project that sits outside of any software or services vendor and has staffed BI/DW architects to can “rapidly” respond to the changing needs of the business with the “correct” solution and the ability to pull in the correct IT resources to meet that need at the speed the business needs before they feel the need to go out on their own.

Reminder: Who manages RI in a data mart?

July 19th, 2010

Reviewed the data model and loading today with the DBA.  To explain the reason for not enforcing RI through foreign key constraints there are few concepts.(see page 536 of the toolkit book mentioned below for more explanation)

1) This is a read only data mart not an operational system so there are single threaded ETL jobs and read only queries.  The ETL process handles the RI by loading the Dimensions and Facts as part of a single correlated load stream.  This means there are not outside applications adding and removing rows so RI can be maintained by the ETL application logic.

2) We are not rejecting fact records based on their keys.  If keys don’t match an appropriate value in a dimension, then the value is added to the dimension (e.g. new title is added)

3) We are not rejecting dimension records based on their keys.  If keys don’t match the lookup attribute table then the Unknown key is used (i.e. -1)

Example: So if a series of  web sites that have an invalid country code we are still going to load web site attributes into the sites dimension tables,  but we will insert the key for an “unknown” country into the country foreign key.

4) We needed to first load a significant amount of data in order to validate that the data model was going to match the proposed model.  We can always add foreign key constraints but since items 2 and 3 handle valid and invalid keys programmatically they constraints would not come into play.

From Oracle: “Unlike many relational-database environments, data in a data warehouse is typically added and/or modified under very controlled circumstances during the ETT process. Multiple users typically do not update the data warehouse directly; this is considerably different from the usage of a typical operational system. Thus, the specific usage of constraints in a data warehouse may vary considerably from the usage of constraints in operational systems”

Recommended reading:

Data Modeling Essentials -Graeme Simsion – ISBN 1-850-32877-3

The Data Warehouse Lifecycle Toolkit – Ralph Kimball (and others) – ISBN 0-471-25547-5

BI Tool Selection

February 6th, 2010

When a company thinks that a BI system will benefit their ability to access and analyze data, a great door of opportunity is opened. Through that door will come many vendors and products that may or may not be a match for your company’s needs and abilities. From the beginning, the success of a system is establishing the needs or requirements and the proposed benefits to the company. Without these, every product can be made to look like it fits and is “just what you are looking for.”

One thing to keep in mind is that these requirements are going to change throughout the life cycle of the design process as requirement-gathering processes uncover additional application issues. Benefits can be directly related to monetary issues (such as cost savings though eliminating unsuccessful spending) as well as non- monetary benefits (such as reducing the time it takes to analyze the last month’s sales trends).

How then does a company choose which product is right-one that will match the current needs and also support the requirements of future applications? There are some simple processes that can help in this decision. The first is to break the requirements into their separate entities. These entities will separate the kinds of questions that need to be asked of the products you are considering and identify your expectations of the application.

The primary classifications are simply by data related issues and by application functionality. While other factors exist (such as corporate relationships as well as vendor characteristics) and are influential, these should be viewed as external forces on the decision process. Other external forces, such as staffing requirements, should also be considered.

This topic will be continued…..