DATAMITE presented its groundbreaking research at the International Conference on Cloud Computing and Big Data (ICCBDC) 2024, held at Oxford University, UK, from 15 to 17 August. The conference provides an international platform for engineers and scientists to discuss advances and applications of cloud and big data, with sessions on data modelling, machine learning, image processing and information security. DATAMITE’s paper introduced a new dimension of data quality called ‘Purity’, which assesses the relevance and importance of data sets in distributed networks, particularly in cloud computing environments. The presentation at ICCBDC underlined DATAMITE’s commitment to pioneering data quality solutions.

The paper – called ‘Purity: a New Dimension for Measuring Data Centralization Quality‘ – written by researchers from Tecnalia, partner of DATAMITE, and from Deusto Institute of Technology, emphasises ‘Purity’ as a crucial metric for assessing the relevance and significance of datasets within decentralised networks, particularly in cloud computing environments. This metric enables organisations to predict data quality issues before data merges, thus optimising cloud resources and facilitating data-driven strategies. The research underscores that early identification of potential quality concerns can streamline processes, enabling more efficient and accurate data handling.

The effectiveness of the purity dimension is validated through a mobility use case, Intelligent Transportation Systems (ITS), analysing four datasets from varied domains. The study assessed each dataset’s importance within the network using centrality indicators—degree, betweenness, and closeness—. Key insights included:

  • Degree Centrality: Measures direct connections a dataset has, indicating connectivity-based importance.
  • Betweenness Centrality: assesses how often a dataset acts as an intermediary in communication between other datasets, highlighting its role in information transfer.
  • Closeness Centrality: Evaluates the efficiency of communication by measuring how quickly a dataset can reach others.

The paper also introduced a comprehensive quality evaluation framework covering six dimensions: accuracy, completeness, timeliness, validity, uniqueness, and consistency. Through mathematical formulations, this framework supports predictive quality evaluations of merged datasets. Future research will refine the ‘Purity’ metric, enhancing its applicability across various domains and data environments.

As part of the outstanding work carried out by the researchers and their presentation at the ICCBDC, they were awarded the Excellent Oral Presentation Award at the conference.

If you want to always stay updated about our project, follow us on  LinkedIn, Twitter and Bluesky!