Trusted Federated Infrastructure: Data, infrastructure, and governance

Authors: Leonardo Camiciotti, Emile Christopher Chalouhi, Fabrizio Garrone, Matthijs Punter, Francesco Bonfiglio

Today, in 2023, we are seeing digitalization at large in many areas of our economy and society. Data is collected and processed in interconnected storage and computing infrastructures, shared, and used for AI applications at an ever-increasing speed and potentially we are just at the beginning of a new growth curve. But can we trust the way and the resources through which this data is collected and shared? Are algorithms transparent enough? Is the secure and compliant processing of data guaranteed? And finally, do we have evidence of where the bits are and transparency on the network and computing fabric that supports the data processing? We argue that a trusted and federated infrastructure is the necessary foundation to address these challenges and enable sustainable growth of data sharing.

So, what are the predictions? The EU expects exponential growth in data volume towards 2025: a data volume increase of 430% (from 33 Zettabyte in 2018 to 175 Zettabyte), a data economy value equal to 829B Euro (301B, 2,4% of the EU27 GDP, in 2018) and a number of “data professionals” equal to 10,9 million (5,7 million within EU27 in 2018). And this is just the mere volume: this growth of data goes hand in hand with the ever-increasing digitalization of every aspect of our society.

This brings us to a fundamental question: is our data-sharing infrastructure capable of handling this growth? And more importantly – as it will cover many sensitive and confidential aspects of our businesses and lives – can we sufficiently trust our data-sharing infrastructure?

The current answer is, clearly: no. Current data-sharing infrastructures are built around a limited number of hyperscalers, which are in most cases under foreign control. And although such services are highly competitive and there are possibilities to contractually deal with aspects of trust and security, there is no “technological” means to enforce trust and security, given the inaccessibility of proprietary infrastructures. Therefore, it remains important to limit our dependency on oligopolies to avoid future lock-ins.

We simply cannot afford a situation of having a single market for data, whereby there is no available offering of data infrastructures with roots in the European business and societal ecosystem. Indeed, in addition to losing control over critical and strategic competencies and assets, a highly concentrated market structure would put innovation and sustainability at risk over the long term.

In particular, alternative models and complements to fully proprietary and closed infrastructure and services are needed in order to achieve complete trust, capillary coverage from the core to the edge and ease of switching between operators.

This regards not only digital sovereignty but also the mitigation of the above-mentioned market concentration that might guarantee robustness, innovation and competition in the mid and long-term. If we want to successfully scale our data-sharing infrastructure, we need to enable the full stack that includes data, rules and infrastructure.

Enabling data sharing at scale will also require additional functionalities, such as digital clearing houses, secure gateways for data exchange and trust frameworks, which provide the fabric of future cloud and data economy. Such functionalities will need to be provided as highly standardized appliances to data spaces. Several market players, such as Cofinity-X and Sovity have already entered this new field of business. In addition, it is expected that computing capabilities will become much more distributed: smart devices, edge computers and other localized and integrated smart solutions need to work seamlessly with the centralized cloud computing infrastructures.

This is exactly the sweet spot which is addressed by initiatives such as Gaia-X. Gaia-X is built on the vision of a federated, open and transparent digital ecosystem. Gaia-X will not do this by providing the underlying services of the digital ecosystem itself. Instead Gaia-X will act as the consensus room where the agreed rules will be discussed, negotiated and designed by the diverse “digital stakeholders”, will produce the blueprint of the necessary trust services on each level of the digital ecosystem, and possibly will foster the realization of the Minimum Viable Products in order to kick-off and promote the industrial adoption and deployment.

On the infrastructure level, it will enable identification services and labels to cloud and computing services, whereas on the data level, it will identify parties, their data offerings and services. Trust services will work in a vertically integrated way too, enabling organizations to verify that they are using a data service of a particular organization, running on a trusted digital infrastructure. Trust services can be offered by a variety of market players, whereby Gaia-X enables their interoperability and mutual interconnectivity. This enables organizations to provide (and use) services across domains.

New EU legislation on data (Data Act, Data Governance Act, AI Act, etc.) is expected to be an important driver for such federated trust architectures. The new legislation will require organizations to be very explicit about their data and its processing. Trust services will be required for users to be able to verify the claims of providers (e.g. on identity, data locality or usage controls) and therefore – establish trust.

A new generation of data infrastructures

It goes without saying that mastering a glossary of terms in the new era of data economy and digital is almost an impossible challenge. Even the new regulations emanated by the European Commission, introduce a series of new concepts and principles, that without a common understanding of meaning could result in subjective, or generic, interpretations.

Nowadays the concept of a data space, as a common initiative to exchange data amongst partners in a specific domain (like automotive, manufacturing, mobility, energy, etc.) is starting to be commonly acknowledged and debated in the industry and the political scene. It is understood, and thanks also to the advocacy effort of Gaia-X, that the data economy is tightly coupled and dependent on the creation of common data spaces. It is also understood that the obstacle to the creation of common data spaces has never been in the availability of data exchange technologies but in their trustworthiness.

So,if everybody recognizes trust as a central element in creating a data economy, do we have a common definition of trust? And: if the data economy isdriven by trusted data exchange, does the infrastructure behind it have amarginal role in enabling it?

The answer to both is a clear no.

There is no standard definition of what a trustworthy service should be in the current EU regulation, and in the absence of it, many aspects of IT services are commonly confused as the trust, or the sovereignty, element. Most common examples are cybersecurity tenure, service localization, network segregation, private cloud instances, public cloud regions, and others; all these are features or variants of a cloud service, but none of them alone provides a sufficient element of trust to store, elaborate, or exchange your data.

In Gaia-X the concept of sovereignty is translated into three characteristics: transparency, controllability, and interoperability, that a service must expose, and the concept of trust is translated into compliance to a set of rules, verified through a technology framework, that provides evidence of the level of transparency, controllability, and interoperability of a service.

In this way, a service may be run in a private cloud, located in the European territory, operated by local personnel, but not interoperable with any other service outside. Or another service may run in a foreign country, in a segregated network, operated by remote European personnel, and interoperable with other services running in a distributed virtual cluster of containers. Which one of the two is more trustworthy or sovereign? The answer is of course: it depends on the user’s needs, but for sure we need a trust framework to see it through and make an aware decision.

When it comes to infrastructures, a growing common belief is that data spaces are infrastructure agnostics, so any existing cloud infrastructure can be adopted. This is because many are expecting all those new features required to make trust and data exchange, to shift from the infrastructure layer to the data layer, but in reality, this paradigm is not completely possible, because the fundamental elements of federation, control and interoperability (like identity and access management, network management, storage and compute resource management are handled at infrastructure level), and wrong (because the lower is the level of implementation of the trust mechanisms, towards the HW with, for example SoC, the higher will be the intrinsic inviolability and trust).

So, we need to think in terms of building a new generation of data infrastructures, where this term refers to a combination of existing cloud technology, trust mechanisms, and rules, that combined provide for trustworthy data storage, data-computing, data transfer, and data exchange.

The phenomenon of data-gravity sums up making it evident that a single provider, in a single location, without interoperability with other providers cannot support the model of federated data spaces in the real world.

Therefore, the need is higher than ever for a new model of federated cloud service providers, that join forces, compensating each other’s limitation in resource scalability, federating each other to allow trusted data exchange, and providing interoperability of workloads between private, public, and edge nodes.

This new generation of data infrastructure should offer containerization (CaaS) and use technologies to create interconnected and liquid networks and infrastructures to encourage enterprises to undertake a substantial shift to the cloud of the existing workloads, which is still stuck at 26% according to current statistic, and slowed down by two main factors: the fear of lock-in, and the high cost of migration of application portfolio by far still legacy, often monolithic, not supporting multi-tenancy, and not cloud-ready.

But is all this possible?

Implementation of such trust services is taking place within both data spaces and interconnection and cloud infrastructures.

In the context of the Gaia-X Lighthouse Project Structura-X, technology providers are experimenting with new technologies for distributed and federated interconnection and cloud architectures. Indeed, the need for a federated infrastructure requires the creation of performing and easy-to-use multi-stakeholder network, storage and computing fabrics.

Aligned with the intention of Gaia-X to reuse and leverage what is already available, low-hanging fruits in regard to interconnection options at the European (and beyond) scale are available. Indeed, a potential interconnection fabric would be easily accessible and could be switched on when requested: the IXPs (Internet Exchange Points) and other “neutral colocation points”, distributed throughout Europe, could be federated and connected on demand, being compliant with the emerging Gaia-X architecture and policies. This would provide the opportunity to create dedicated or Internet-driven connection paths, which could be considered the network fabric requested by all kinds of applications, including the most privacy, performance, and security sensitive ones, and needed to be a viable alternative or complement to the current solutions offered by the dominant (non-European) players.

At the computing level, the need for the possibility to move and distribute workloads between different infrastructures and to control the related costs requires frameworks that enable the creation of “liquid” layers out of computing resources belonging to different operators.

An example of this is Liqo.io, a new layer for “liquid” (cloud) resource federation, enabling organizations to dynamically connect to different heterogeneous providers of storage and computing resources. This creates a new kind of ‘resource roaming’, which – obviously – can only function if the underlying resources can be trusted and the resulting services can be certified accordingly. Other frameworks and components like Rancher, Istio, Komodor and Rafay are addressing the challenges from different perspectives, offering different options to design and implement solutions matching the needs of the different use cases. Several players have already joined the “federated infrastructure” initiative, ranging from big technology players to local internet service providers. Indeed, this architectural framework would unleash at least three business cases that could boost the European cloud market and boost the data spaces economy:

RESOURCE EXPANSION: the ability to engage and use third-party resources allows handling computing bursts (e.g. need for GPUs for AI training over a period of time) without requiring investments hardly sustainable for single players.
ONE-TO-MANY: the “cloud roaming” option enables an extended market reach for service distribution at the edge of a larger geographical footprint. A single provider could accommodate the needs of many distributed users and increase its market share.
MANY-TO-ONE: the possibility to seamlessly glue different resources and services would allow a real service composition, with the possibility to control the distribution of workloads, realizing a “single provider” experience over a multistakeholder fabric. In other words, it would be possible to replicate the one-stop-shop experience of the hyperscalers with the federation of different providers. Geographical reach and service portfolio breadth would be largely expanded thanks to a collaborative model, which would guarantee lock-in avoidance and innovation sustainability over time.

These architectural and business models would offer a compelling and performing alternative to the dominant solutions, which should be not seen as a simple competitor but as an added-value option to complete and extend the hyperscalers fabric, introducing the possibility to enforce real sovereignty.

Furthermore, data space lighthouse initiatives in different countries and industries have started to adopt the trust infrastructure to identify organizations, their datasets, and data services in their respective communities. Gaia-X-enabled trust services not only enable trust in their own communities but also facilitate trust across domains. And in addition, it is possible to provide trust between the data and infrastructure levels too.

In the next few years, we need to grow this ecosystem step-by-step.

First, as said, by developing new cloud and infrastructure services. Such services will be way more distributed and decentralized than the current offering of hyperscalers. We hope it will be possible to build a European offering, while at the same time continuing to work together on a global scale with big players in Asia and the US. Public investment through the IPCEI programme is a key driver for European developments and it is expected that new start-ups (focused on dedicated markets) will play an important role too.

Secondly, we expected growth with the advent of technology & trust providers in various data ecosystems. In the Netherlands, iSHARE has indicated its intent to provide Gaia-X compliant trust services. In Germany, several leading technology providers have started Cofinity-X to provide connectors and trust services to the automotive data space, to name a few. Through the Simpl initiative, the European Union will fund open-source reference implementations with a high technology readiness level. Gaia-X will continue to foster such developments and ensure interoperability between such service providers of data spaces.

Finally – and ultimately – growth is driven by data ecosystems themselves. Governments and industries have committed significant investments in data spaces in various industrial sectors: from the health to the mobility data space and from manufacturing to smart cities, data space initiatives are popping-up everywhere. It is likely that they will provide a significant market pull for the offerings of the underlying service providers, especially as they will transition from their current initiation stage to a growth and scale-up stage in the years to come. This growth will come with a clear need for trust, and this is exactly what we aim to provide.

*Source: https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/europe-fit-digital-age/european-data-strategy_en#projected-figures-2025

**Source: Study by Pr. Frédéric Jenny – https://www.fairsoftwarestudy.com/

Related news

Gaia-X

Information