Driving Data Innovation: Progress and future plans for DATAMITE pilots
The DATAMITE project is steadily progressing through its six innovative pilots, each addressing unique data management challenges and demonstrating DATAMITE's potential for innovation in data sharing, governance and quality assessment. With milestones set through 2025, these pilots will strengthen the DATAMITE framework's ability to address diverse sectoral challenges and provide sustainable, scalable solutions for data sharing and management.
Here is an overview of the progress and upcoming milestones for each of these pilots:
Pilot 1: Data exchange between companies within the same corporate group
The infrastructure for data exchange is in place, with an on-premises DATAMITE ecosystem using PostgreSQL, MongoDB, and Cassandra. Test scenarios have been established, and a beta test API, developed using Django, facilitates interactions for OBREMO and FACSA use cases. Sensitive data management is in line with DATAMITE's legal advice, with problematic and controversial data being removed. Finally, cloud deployment testing is underway in collaboration with ITI.
Future steps
- Deployment of the DATAMITE framework, at least in the on-premises infrastructure.
- Communication and connection testing with API, with additional integration in case of interest.
- Collect and analyse initial KPIs.
- Produce multimedia materials to showcase test results.
- Drafting the pilot report.
Pilot 2: Corporate Multi-Site Data Exchange
The DATAMITE framework has been deployed in a development infrastructure. Test scenarios and synthetic datasets have been prepared in Month 23 (November 2024). The team is refining the data structures and running preliminary evaluations of the framework's performance.
Future steps
- Full deployment of the framework in the pilot development infrastructure.
- Finalise and execute use case validation.
- Set up production pilot infrastructure in Azure.
- Evaluate the DATAMITE framework on different operating systems, such as Red Hat Enterprise Linux flavours, to compare with traditional distributions such as Ubuntu or Debian.
- Submit a publication describing the pilot to the next AIAI 2025 conference.
Pilot 3: Offering Data to Service Providers with DataSpaces
Component deployment and testing began in month 17 (May) and is scheduled to be completed by month 25 (January 2025). The current priorities include deploying and testing the relevant DATAMITE components, finalizing the test scenarios, and contributing to the course design. Additionally, the deployment will soon be moved from Google Cloud VM to a permanent solution.
Future steps
- Carry out the first iteration of scenario testing between November 2024 and the end of January 2025, completing component deployment and testing, and gathering feedback to refine the second iteration of scenarios testing from March 2025.
- Migration of the deployment to an on-premises infrastructure.
- Deployment of Data Sharing components and publication of anonymised sensitive and composite data products.
- Draft the initial pilot report.
Pilot 4: Leverage Electricity Distribution Open Data
The internal infrastructure, which will be based on a MS Azure environment, is ready for deployment, while Data Quality, Data Governance and Data Support Tools have already started to be deployed. Testing will be divided into three levels and is focused on improving data collection and sharing processes.
Future steps
- Complete and validate pilot test scenarios.
- Deploy and test components from the Data Support Tools, Data Quality and Data Governance modules until the end of January 2025, which will be the focus of the first two levels of the test scenarios
- Draft the initial pilot report
Pilot 5: Connecting eDWIN to Data Markets
The team has completed the integration with Pontus-X for the agri-food sector. Meteorological data has been integrated with AIM and normalisation pipelines for pest and production (agrifood domain) datasets are operational.
Future steps
- Complete the first automated data workflow.
- Integrate eDWIN with DATAMITE data quality and data governance tools.
- Conduct test scenarios and complete demonstrators by the end of February 2025.
Pilot 6: Connecting MISTRAL to the EU AI-ON-Demand Platform
Resource allocation on the cloud is underway, along with the deployment of the development environment to test the DATAMITE framework. The team is also currently defining data quality checks and analysing data connectivity for external catalogues such as AI-on-Demand and the Mistral Open Data Catalogue.
Future steps
- Integration of the DATAMITE framework with the Mistal platform.
- Implementation of a Mistral connector to retrieve (meta)data in the DATAMITE framework.
- Dissemination of the data catalogue with data sharing to a local open data catalogue on the one hand, and to an external AI-on-demand platform on the other.
- Set up user-defined custom rules.
- Enable data quality checks with the Data Quality Module.
If you want to always stay updated about our project, follow us on LinkedIn, Twitter and Bluesky!
DATAMITE plenary meeting in Aachen
From 26 to 28 November, the DATAMITE consortium met in Aachen for its second - and last - plenary meeting of 2024. As at the previous plenary meeting in Poznan, the agenda was divided into a first day of CodeCamp with the developers involved in the project, followed by two days with the whole consortium, where each work package had its moment in the spotlight with presentations and/or workshops.
The first day was spent at the CodeCamp, as a warm-up for the plenary sessions that would take place over the next two days. This edition was less about coding and more about practical activities and shaping the future of the DATAMITE framework. All DATAMITE design and coding partners were divided into parallel sessions, according to their needs, in order to get the most out of CodeCamp.
On 27 and 28 November the remaining partners who were not involved in the codecamp joined in. The DATAMITE consortium exchanged ideas, discussed the results of the review meeting and set the roadmap for the final year of the project. In contrast to previous plenary meetings, and despite the structure of the agenda being divided into work packages, the Aachen meeting was much more practical and focused on encouraging the participation of all partners in all work packages through more interactive workshops and sessions.
Some of the main topics discussed were the six pilot projects of the project and the future steps to be taken before the end of 2025. Exciting things to come next year!
In addition to three intense days of work, the DATAMITE consortium enjoyed a guided tour of the Aachen Christmas Market and a visit to the Demonstration Factory, which is a central component of the Smart Logistics Cluster on the RWTH Aachen campus, thanks to the colleagues from FIR RWTH Aachen University who hosted this plenary meeting.
If you want to always stay updated about our project, follow us on LinkedIn, Twitter and Bluesky!
DATAMITE presents a paper at ICCBDC 2024
DATAMITE presented its groundbreaking research at the International Conference on Cloud Computing and Big Data (ICCBDC) 2024, held at Oxford University, UK, from 15 to 17 August. The conference provides an international platform for engineers and scientists to discuss advances and applications of cloud and big data, with sessions on data modelling, machine learning, image processing and information security. DATAMITE's paper introduced a new dimension of data quality called 'Purity', which assesses the relevance and importance of data sets in distributed networks, particularly in cloud computing environments. The presentation at ICCBDC underlined DATAMITE's commitment to pioneering data quality solutions.
The paper - called 'Purity: a New Dimension for Measuring Data Centralization Quality' - written by researchers from Tecnalia, partner of DATAMITE, and from Deusto Institute of Technology, emphasises 'Purity' as a crucial metric for assessing the relevance and significance of datasets within decentralised networks, particularly in cloud computing environments. This metric enables organisations to predict data quality issues before data merges, thus optimising cloud resources and facilitating data-driven strategies. The research underscores that early identification of potential quality concerns can streamline processes, enabling more efficient and accurate data handling.
The effectiveness of the purity dimension is validated through a mobility use case, Intelligent Transportation Systems (ITS), analysing four datasets from varied domains. The study assessed each dataset's importance within the network using centrality indicators—degree, betweenness, and closeness—. Key insights included:
- Degree Centrality: Measures direct connections a dataset has, indicating connectivity-based importance.
- Betweenness Centrality: assesses how often a dataset acts as an intermediary in communication between other datasets, highlighting its role in information transfer.
- Closeness Centrality: Evaluates the efficiency of communication by measuring how quickly a dataset can reach others.
The paper also introduced a comprehensive quality evaluation framework covering six dimensions: accuracy, completeness, timeliness, validity, uniqueness, and consistency. Through mathematical formulations, this framework supports predictive quality evaluations of merged datasets. Future research will refine the 'Purity' metric, enhancing its applicability across various domains and data environments.
As part of the outstanding work carried out by the researchers and their presentation at the ICCBDC, they were awarded the Excellent Oral Presentation Award at the conference.
If you want to always stay updated about our project, follow us on LinkedIn, Twitter and Bluesky!
Upcoming events
The DATAMITE consortium partners are constantly working to showcase the progress of the project, presenting the latest publications and articles at various national and international events. Here is a list of events not to be missed between November and December 2024.
ADRF 2024
The AI, Data, Robotics Forum (ADRF) will take place from 4 to 5 of November in Eindhoven, Netherlands. Under the theme ‘European Sovereignty in AI, Data and Robotics’ the ADR Forum is a premier annual event organised by the AI, Data and Robotics Association (Adra) in collaboration with the European Commission that brings together leading experts, innovators, policymakers, and enthusiasts from the AI, Data, and Robotics community. DATAMITE is a proud sponsor of this year’s edition, and the project will be represented through the DATAMITE coordinator, Santiago Cáceres (ITI), and the presentation of a project poster.
InfoCom World 2024
The 26th InfoCom World Conference will take place next 12th of November in Athens, Greece, under the title ‘Digital Greece; Time for a leap!’. The annual meeting of the market for Technology, Information Technology and Telecommunications, will focus on the dynamic development of the country through the adoption and exploitation of new technologies, the challenges and opportunities that arise, and the steps that must be taken by market stakeholders, to implement the digital leap. DATAMITE will be represented by one of its Greek partners, OTE, which will have a booth at the conference and will participate in the sessions organised during the day.
Más allá del Horizonte
The 28th of November will take place the 'Más allá del Horizonte' (Beyond the Horizon) Conference in Oviedo, Spain. The conference is organised by the Spanish Centre for Technological Development and Innovation (CDTI), in collaboration with the Spanish Ministry of Science, Innovation and Universities, the European Commission and the City of Oviedo. Its main objective is to analyse the functioning of Horizon Europe and its major initiatives so far, including projects funded by Horizon Europe with Spanish participation. DATAMITE will be one of the projects that can be seen in the poster exhibition. The participation of DATAMITE in the conference is done together with the project coordinator, ITI - Instituto Tecnológico de Informática de Valencia-.
If you want to always stay updated about our project, follow us on Twitter and LinkedIn!
Discover the DATAMITE Open Source Repository
We are pleased to announce the DATAMITE open source repository developed by the DATAMITE partners and hosted at Eclipse Research Labs. As part of DATAMITE's commitment to open science, and advocating for interoperability and openness, the DATAMITE Framework is provided as open source software and is available in its public repository.
The DATAMITE open source repository, hosted at Eclipse Research Labs in Gitlab, is the maximum exponent of the project's commitment to open science in all its aspects, and joins other open repositories such as Zenodo, which hosts all the publications written in the context of DATAMITE. Thus, DATAMITE has an Open Source Code Repository and an Open Access Research Repository.
The DATAMITE open source repository has eight sub-repositories to make navigation as easy and understandable as possible. These repositories are:
- Data-governance.
- Data-quality.
- Data-security.
- Data-Sharing.
- Data-support-tools.
- Docs.
- Fronted
- IP-analysis.
You can access our main repository here or from the icon in the top right corner of our website.
If you want to always stay updated about our project, follow us on Twitter and LinkedIn!
DATAMITE plenary meeting in Poznan
From 14 to 16 may, 2024, the DATAMITE Consortium met in Poznan to host its first plenary meeting of 2024. The meeting was three days divided into a first day of CodeCamp with the developers involved in the project, followed by two days of presentations of each work package with the entire consortium.
The first day was dedicated to a CodeCamp with the members in charge of developing the project's code. The team was divided into three groups to be able to talk in person about the status of the project and work together on the obstacles they are encountering. These three groups were ‘Data governance and Data discovery’, ‘Data Quality’, and ‘Gaia-X Onboarding’.
For the second and third day of the plenary meeting, the remaining partners who were not involved in the codecamp joined in. The 15th and 16th of May were dedicated to presentations of each work package and different internal workshops. These activities aimed to keep the whole consortium informed about the status of each DATAMITE element, while also fostering exchange of ideas among partners to gather valuable inputs for upcoming tasks.
Among some of the most important topics discussed were the next deliverables that the project has to present and the mid-term project review meeting that will take place during September.
In addition to the three intense days of work, the DATAMITE consortium enjoyed a guided tour of the beautiful city of Poznan followed by a local dinner and a visit to the Poznan Supercomputing and Networking Center (PSNC) facilities, thanks to the colleagues of PSNC, who hosted this plenary meeting.
If you want to always stay updated about our project, follow us on Twitter and LinkedIn!
The fifth DATAMITE newsletter is out!
The fifth edition of the DATAMITE newsletter is now available. In this new edition you will find:
- Use Cases: Offering Data to Service Providers with DataSpaces. by HEDNO and CERTH.
- The latest interviews of our campaign: People behind DATAMITE.
- External and internal events we have attended as a consortium.
- Don't miss the video we made for the International Day of Women and Girls in Science: "We are proud to be women scientists".
- Call for Papers and Abstracts!
Read it in our Zenodo repository, and in our LinkedIn Newsletter section.
If you want to always stay updated about our project, follow us on Twitter and LinkedIn!
DATAMITE internal workshop - Exploitation Webinar
One of the great advantages of being such a large consortium of research and development partners, academic partners, large industrial partners, SMEs and a standards body, is to be able to learn from each other. In order to foster this continuous exchange of knowledge among the partners and thus acquire, as a Consortium as a whole, new skills to achieve the best project results in all fields, Antonis Sapountzis, leader of Work Package 6, hosted an internal webinar on exploitation.
Exploitation, included in Work Package 6: Outreach, Exploitation and Collaboration, is the key to achieving wider impacts of DATAMITE in the longer term (5 years or more) on the scientific community, the economy and society.
As an ambitious open source project, DATAMITE has a clear path for the exploitation of its results designed by AUSTRALO, DATAMITE's partner in charge of the Dissemination, Exploitation and Communication plans. The project's exploitation plan is based on three main exploitation models:
- The commercial exploitation model, which involves the paid, open-access provision of project results to end-users.
- The research exploitation model, which involves the use of acquired research knowledge in future research activities.
- The technology exploitation model, which involves the use of acquired technological knowledge for the development of innovative products and their supply.
Internal workshops such as this one allow the consortium to acquire the necessary skills to ensure not only immediate success, but also in the medium and long term, where DATAMITE aims to be a major player in the European data market revolution.
If you want to always stay updated about our project, follow us on Twitter and LinkedIn!
Call for papers: eSAAM 2024 on Data Spaces
The 4th Eclipse Security, Artificial Intelligence, Architecture, and Modelling (eSAAM) Conference on DATA SPACES will take place 22 October in Mainz, Germany. This event is organised by Instituto Tecnológico de Informática (ITI), the coordinator of DATAMITE, along with the Centre for Research & Technology Hellas (CERTH), the Eclipse Foundation, and the University of Macedonia. The conference will bring together industry experts and researchers working on innovative software and systems solutions for data spaces.
The eSAAM 2024 on Data Spaces conference is an excellent platform for researchers to showcase their work to stakeholders from industry, standards bodies and open source initiatives, driving the evolution of data spaces. To this end, a call for papers has been launched with a submission deadline of 12 May.
This conference encourages submissions that report on constructive, design-oriented research on innovative artefacts, such as software, models, and methods related to the conference theme. The conference is focused on, but not limited to, the following topics of interest:
- Data Space Security and Privacy.
- Data Space Architecture.
- AI and Machine Learning in Data Spaces.
- Modelling for Data Space Systems.
More information about the conference and the rules of presentation here.
If you want to always stay updated about our project, follow us on Twitter and LinkedIn!
Women in Science Day: we are proud to be women scientists
Every 11th of February we celebrate the International Day of women and Girls in Science. This milestone is a celebration but, moreover, it is a reminder of the need to achieve equality in STEM in order to not miss any possible talent. “A significant gender gap has persisted throughout the years at all levels of science, technology, engineering and mathematics (STEM) disciplines all over the world. Even though women have made tremendous progress towards increasing their participation in higher education, they are still under-represented in these fields”, states the UN.
The DATAMITE consortium is acutely aware of the reality of inequality in the STEM sector and fully supports measures to promote and establish real equality, both within the project consortium and across the scientific community. At DATAMITE we are aware that we are a rare bird, as we have a large number of women in our ranks. And yet, we are far from the desired equality.
Naturally, the focus today is on the women who are part of the project and who have joined forces to shout out loud "we are proud to be women scientists".
Watch their video here.