Abstract:
|
Nowadays, business analytical users need agile processes spanning from the selection
of relevant data from raw data sources to the generation of data structures
prepared to serve as input for OLAP, Data Mining and/or other analytical tools.
However, the wide range of analytical needs and the increasingly need of adaptive
Business strategies discourages the use of the ’All-In-One’ existing suites (i.e.,
end-to-end Solutions from a single vendor). Oppositely, an agile approach suiteindependent
is advisable to boost user’s independence from a specific vendor and
the analytical capabilities enabled by combining several suites / tools according to
the user’s needs. In this thesis we present and develop ’SETA’, a suite-independent
agile analytical framework by proposing a novel approach combining rich metadata
definition and automation components. As proof of validity, we instantiate
the developed framework in a real-world project for the WHO Chagas Programme.
This thesis introduces two main contributions. First, an approach to store and
integrate a set of heterogeneous data sources into a flexible data store in some
intermediate point between the classical Data Warehouse (DW) approaches and
the recent Data Lake strategies. We argue that classical DW systems are too
rigid to accommodate agile analytical pipelines, whereas Data Lakes and Big Data
technologies are not suitable to much of today’s organizations. Thus, a novel
approach combining both approaches is presented. Second, a rich definitional
system to represent 1) the data components at Source, Global Schema and Domain
levels, 2) the data mappings between this levels and 3) the final user analytical
requirements. This definitional system provides a flexible view of the data schema
at different levels and habilitates the automation of the target data schemas and
the ETL to feed them. |