Real time data loading and OLAP queries: Living together in next generation BI environments

Diego Pereira, Leonardo Guerreiro Azevedo, Asterio Tanaka, Fernanda Baião

Abstract


Real time ETL (Extraction, Transformation and Loading) of enterprise data is one of the foremost features of next generation Business Intelligence (BI 2.0). This article presents a proposal for loading operational data in real time using a Data Warehouse (DW) architecture with faster processing time than current approaches. Distributed processing techniques, such as data fragmentation on top of a shared-nothing architecture, are used to create fragments that are specialized in most current data and optimized to achieve real time insertions. Using this approach, the DW is updated near-line from operational data sources. As a result, DW queries are executed over real time data or very close to that. Moreover, real time loadings do not impact queries response time. In addition, we extended the Star Schema Benchmark to address loading operational data in real time. The extended benchmark was used to validate and demonstrate the efficiency of our approach, when compared to other in the literature. The experiments were performed in the CG-OLAP research project environment.

Keywords


Real Time Data Warehouse, Business Intelligence 2.0, Database Distribution, Database Fragmentation

Full Text:

PDF


An official publication of the Brazilian Computer Society Special Interest Group on Databases.