Abstract:
|
We describe efficient algorithmic proposals to approach
three fundamental problems in data mining:
association rules, episodes in sequences, and
generalized association rules over hierarchical taxonomies.
The association rule discovery problem aims at identifying
frequent itemsets in a database and then forming conditional
implication rules among them. For this association task, we will
introduce a new algorithmic proposal to reduce substantially
the number of processed transactions. The resulting algorithm,
called Ready-and-Go, is used to discover frequent sets efficiently.
Then, for the discovery of patterns in sequences of events in
ordered collections of data, we propose to apply the appropriate
variant of that algorithm, and additionally we introduce a new
framework for the formalization of the concept of interesting
episodes. Finally, we adapt our algorithm to the generalization
of the frequent sets problem where data comes organized in
taxonomic hierarchies, and here additionally we contribute
with a new heuristic that, under certain natural conditions,
improves the performance. |