The BI Data Process Overview

“How do you get data organized and stored so that Business Intelligence tools can access and distribute valuable information to the right people at the right time?”

I intend to summarize a technique that I have used for several years now and will do a deep dive on each topic in future posts. I adopted this approach to data management over a decade ago and it has worked out very well for many different projects, in several different Industries. This method has been proven time and time again to be flexible, scalable, and adaptable in the disciplines of business intelligence. As I have shared this with others in the BI community and it has been well received. Not that I am the inventor, but I have transformed and honed the technique over the last several years of practice. This approach has worked for projects where the data was under a gigabyte up to hundreds of terabytes. I have found many names for this approach, and found it to be useful across various DBMS platforms and data transformation toolsets.

The following terms have been used to describe this approach;
1) Stage/ Integration/ Analytics
2) L1/L2/L3
3) Extract, Load, and Transform
4) Kimball/ Inmon

For the purposes of these posts I will use the notation outlined in point number one above (Stage/ Integration/ Analytics). Let me summarize what these things are here briefly and dive into details later. Stage is a simple landing zone for the data. The Stage environment is loaded with data on a regular basis and is a temporary storage area. Integration is a normalized transformation of the data that has been through a data quality process and relationship between multiple source systems have been rationalized. Data in this layer is normalized to optimize relationships between disparate source systems and to facilitate data storage efficiency. Finally, Analytics is the final transformation of the data into a star or snowflake schema that is easily consumed by a business intelligence platform. That is the first iteration of the data path to production. From this point forward a BI platform can be configured to access the information in the Analytics area and presented out to the user community.

The rationale for this approach comes from the need to integrate and analyze data at the speed of business. Gone are the days of being able to wait for 18 months, or even longer, while a data team builds out a comprehensive data warehouse. This approach provides for a flexible and dynamic data environment that can scale over time and flex with changes as the business moves forward. As for decisions about what data is required to go through this process; we utilize the Reporting Logical Modeling approach to gather and organize data subject areas and their component pieces. This approach collects and models, at a high-level, the information needs of the final analytics solution. From this point we can start with the data flow diagrams, data mapping documents, and transformation specs, all of which feed into the data storage and transformation code processes.

The part I appreciate about this process is the flexibility of the architecture. We have delivered solutions within this model in as little as 5 weeks. Of course data volume and subject area breadth of the data being consumed, play a large part in the timeline, but compared to alternatives I believe this architecture provides the most cost effective and valuable result for data consumption and transformation. I look forward to sharing, in future posts, additional details of this approach.

Until next time,

Originally posted here: