The DATAMITE project is steadily progressing through its six innovative pilots, each addressing unique data management challenges and demonstrating DATAMITE’s potential for innovation in data sharing, governance and quality assessment. With milestones set through 2025, these pilots will strengthen the DATAMITE framework’s ability to address diverse sectoral challenges and provide sustainable, scalable solutions for data sharing and management.
Here is an overview of the progress and upcoming milestones for each of these pilots:
Pilot 1: Data exchange between companies within the same corporate group
The infrastructure for data exchange is in place, with an on-premises DATAMITE ecosystem using PostgreSQL, MongoDB, and Cassandra. Test scenarios have been established, and a beta test API, developed using Django, facilitates interactions for OBREMO and FACSA use cases. Sensitive data management is in line with DATAMITE’s legal advice, with problematic and controversial data being removed. Finally, cloud deployment testing is underway in collaboration with ITI.
Future steps
- Deployment of the DATAMITE framework, at least in the on-premises infrastructure.
- Communication and connection testing with API, with additional integration in case of interest.
- Collect and analyse initial KPIs.
- Produce multimedia materials to showcase test results.
- Drafting the pilot report.
Pilot 2: Corporate Multi-Site Data Exchange
The DATAMITE framework has been deployed in a development infrastructure. Test scenarios and synthetic datasets have been prepared in Month 23 (November 2024). The team is refining the data structures and running preliminary evaluations of the framework’s performance.
Future steps
- Full deployment of the framework in the pilot development infrastructure.
- Finalise and execute use case validation.
- Set up production pilot infrastructure in Azure.
- Evaluate the DATAMITE framework on different operating systems, such as Red Hat Enterprise Linux flavours, to compare with traditional distributions such as Ubuntu or Debian.
- Submit a publication describing the pilot to the next AIAI 2025 conference.
Pilot 3: Offering Data to Service Providers with DataSpaces
Component deployment and testing began in month 17 (May) and is scheduled to be completed by month 25 (January 2025). The current priorities include deploying and testing the relevant DATAMITE components, finalizing the test scenarios, and contributing to the course design. Additionally, the deployment will soon be moved from Google Cloud VM to a permanent solution.
Future steps
- Carry out the first iteration of scenario testing between November 2024 and the end of January 2025, completing component deployment and testing, and gathering feedback to refine the second iteration of scenarios testing from March 2025.
- Migration of the deployment to an on-premises infrastructure.
- Deployment of Data Sharing components and publication of anonymised sensitive and composite data products.
- Draft the initial pilot report.
Pilot 4: Leverage Electricity Distribution Open Data
The internal infrastructure, which will be based on a MS Azure environment, is ready for deployment, while Data Quality, Data Governance and Data Support Tools have already started to be deployed. Testing will be divided into three levels and is focused on improving data collection and sharing processes.
Future steps
- Complete and validate pilot test scenarios.
- Deploy and test components from the Data Support Tools, Data Quality and Data Governance modules until the end of January 2025, which will be the focus of the first two levels of the test scenarios
- Draft the initial pilot report
Pilot 5: Connecting eDWIN to Data Markets
The team has completed the integration with Pontus-X for the agri-food sector. Meteorological data has been integrated with AIM and normalisation pipelines for pest and production (agrifood domain) datasets are operational.
Future steps
- Complete the first automated data workflow.
- Integrate eDWIN with DATAMITE data quality and data governance tools.
- Conduct test scenarios and complete demonstrators by the end of February 2025.
Pilot 6: Connecting MISTRAL to the EU AI-ON-Demand Platform
Resource allocation on the cloud is underway, along with the deployment of the development environment to test the DATAMITE framework. The team is also currently defining data quality checks and analysing data connectivity for external catalogues such as AI-on-Demand and the Mistral Open Data Catalogue.
Future steps
- Integration of the DATAMITE framework with the Mistal platform.
- Implementation of a Mistral connector to retrieve (meta)data in the DATAMITE framework.
- Dissemination of the data catalogue with data sharing to a local open data catalogue on the one hand, and to an external AI-on-demand platform on the other.
- Set up user-defined custom rules.
- Enable data quality checks with the Data Quality Module.
If you want to always stay updated about our project, follow us on LinkedIn, Twitter and Bluesky!