By Jordi Arjona Aroca, the DATAMITE Coordinator

Think of a puzzle. If you have little girls, like I do, you have a 100 pieces Elsa and Anna in front of you, but that is a different story. Each one of those 100 pieces is a data fragment. By themselves, they probably mean nothing or have no purpose. However, it is when we put them together, that we create an image, something valuable. Making the puzzle is usually straightforward, might take longer or not, but you have a perfect group of pieces, that match and complement and will let you extract some added value, conclusion, or Frozen picture. Now well, problems come if your daughters mixed three puzzles, took 20 pieces at random to see whether they float in the tube, and a couple more to see how they taste. Now you have more data that you really need (say 300 pieces), although your data is incomplete (remember those 22 fallen pieces), and you better forget of ever seeing that picture complete again.

These are examples of the challenges organizations face with their data. Organizations tend to store data that they will never use, maybe because it is not well described, or because it is not findable or not usable. In addition, these data usually have problems: it may not be complete, there may be outliers, there can be different ways of denominating the same information, … basically, you do not have a 100 pieces puzzle. Moreover, in general, organizations have large quantities of data, but it is not good data. This has a critical consequence; it is extremely hard to monetize it.

Organizations need tools that allow them to easily enrich their data with metadata, uniformly, so non-technical personnel can easily find it, use it and consume it. They also need tools to ensure that data is good and, otherwise, be able to identify its issues and solve them. It is equally important to define security policies that ensure that only those who are allowed can access the data and avoid security breaches. Then, once you can ensure these aspects, you will be able to rely on your data and think of monetizing it, exploiting it internally, creating reliable models or projections that help boosting your revenues. Moreover, you may even think of sharing or trading it with third parties, but here, again, you must be able to define and enforce terms of use, which is not trivial at all. Of course, assuming that you have technical and business personnel that know how to handle your data.

Currently, we have tools that may aid in these purposes… but do not cover the whole span of needs organizations have. Furthermore, most of them will imply the payment of a hardly affordable, specially for SMEs, license. Open-source tools can help with some of the needs, but we do not have a framework that can help the organizations making up the European productive fabric.

DATAMITE is here to help. During our project we will create an open-source modular framework covering aspects related to data governance, quality, security, and sharing, mainly. Jointly with this, we will foster the creation of an open-source community around the project and produce, within, technical and business training materials with the mission of upskilling EU data professionals and assisting on increasing the maturity level of companies. To do so, we have put together a great and well-balanced consortium, led by ITI and composed by 26 partners of 12 countries. To validate our approach, we will deploy 6 different pilots in different environments (e.g., industry, energy, agrifood, meteo) and with different goals, such as sharing data in Data Spaces, with the EU AIoD or in EOSC, improving how data is consumed within large companies or how it can be offered to EU researchers.

Do you want a bunch of pieces? If you prefer a puzzle, stay tuned!

If you want to always stay updated about our project, subscribe to our newsletter and follow us on Twitter and LinkedIn!