Towards a reverse engineering approach for guiding user in applying data mining

  1. Roberto Espinosa 1
  2. José-Norberto Mazón 2
  3. José Zubcoff 2
  1. 1 Universidad de Matanzas
    info

    Universidad de Matanzas

    Matanzas, Cuba

    ROR https://ror.org/04hk86037

  2. 2 Universitat d'Alacant
    info

    Universitat d'Alacant

    Alicante, España

    ROR https://ror.org/05t8bcz72

Book:
Libro Actas de las Jornadas de Ingeniería del Software y Bases de Datos (JISBD´11)

Publisher: Ángeles Saavedra Places ; Coral Calero Muñoz ; Universidad de La Coruña

ISBN: 978-84-9749-486-1

Year of publication: 2011

Pages: 23-30

Congress: JISBD´11 (16. 2011. A Coruña)

Type: Conference paper

Abstract

Data mining is at the core of the knowledge discovery process. However, an initial preprocessing step is crucial for assuring reliable results within this process. Preprocessing of data is a time-consuming and non-trivial task since data quality issues should be considered. This is even worst when dealing with complex data, not only because of the different kind of complex data types (XML, multimedia, and so on), but also because of the high dimensionality of complex data. Therefore, to overcome this situation, in this position paper we propose using mechanisms based on data reverse engineering for automatically measuring some data quality criteria on the data sources. These measures will guide user in selecting the most adequate data mining algorithm in the early stages of the knowledge discovery process. Finally, it is worth noting that this work is a first step towards considering, in a systematic and structured manner, data quality criteria for supporting data miners in applying those algorithms that obtain the most reliable knowledge from the available data sources.