GAIA-X Catalogue search engine – under the hood

In today’s world, and now even more so with the recent quarantine measures, we rely on cloud services for running our infrastructures and storing our data.

The GAIA-X initiative is born from a need to raise data sovereignty awareness and create a federated trustworthy cloud ecosystem for Europe.

In this article, we will discuss how the GAIA-X demonstrator group – composed of 3DS OUTSCALE, Docaposte, German Edge Cloud, Orange Business Services, OVHcloud, Scaleway and T-System – created a catalogue search engine prototype.

One goal of the catalogue search engine was to enable a user to search and select the services that match their needs. The objective is to be inclusive and expose information to the user so they can make transparent, educated choices. The information described for each service includes at least all the relevant “Policy Rules” defined by the GAIA-X governance as mandatory and a set of technical descriptions. The “Policy Rules” covers a variety of domains (data protection, security, reversibility, etc.) and is developed in two documents: one for Infrastructure Policy Rules, and another for Data and Software Policy Rules.

Of course, all of these should be available in a machine readable format to enable automated service selection.

The first step is to list various infrastructure scenarios based on each of our respective customer requests and extract a semantic out of the requirements.

For example, if a customer requests “a database compliant with PCI-DSS for payment services, hosted in Germany following the CISPE Data Protection Code of Conduct”, the entities of interest are “database”, “PCI-DSS”, “Germany” and “CISPE Data Protection”.

But how do the entities relate to each other and how are they defined? And can we classify them?

One intuitive way to represent the semantics was to use graphs like the ones below:

With those various scenarios in mind, we found patterns and decided for the first release to have a simple graph database ontology (below):

This ontology is not intended to cover all future GAIA-X needs. We have more advanced ontologies we will cover in future articles.

We also defined a taxonomy to classify services and enable attributes on nodes.

The next step was to deploy a development and staging environment. We decided not have a production named environment to avoid confusion and highlight the fact that this demonstrator is an implementation of how the GAIA-X catalogue could work and not necessarily how it will be implemented later.

We used Gitlab with a managed Kubernetes cluster integration and standard CI/CD pipeline setups for deploying the micro-services. We used Neo4j graph database for storing the data.

With an ontology and taxonomy, the next step was to fill a database.

To speed-up the development, we focused on Object Storage service for the release. Each of the cloud providers working on the demonstrator filled the database with its Object Storage service description.

Finally, as one of the criteria of GAIA-X is User-Friendliness, we created an intuitive way to query the database with a classic search engine user-interface.

For advanced querying, we decided to not expose Neo4j Cypher query language but to create an even simpler parsing grammar based on the relationships between the nodes and their attributes. This allowed us to implement a free-form input field with auto-completion. We used the Parsing Expression Grammar PEG.js engine using the following grammar:

logical_and ::= expression "AND" logical_and
expression  ::= "NOT" expression / primary
primary     ::= "(" logical_or ")
              | rules
rules       ::= node relation node
              | node relation "ANY(" node+ ")"
              | node relation "ALL(" node+ ")“
              | "=" value
              | "!=" value
              | "IN ANY(value+ “)"

This grammar enables the user to express more advanced queries such as:

Service IMPLEMENTS ANY('S3', 'SWIFT') AND (Service COMPLIES_WITH 'GDPR' OR Provider LOCATED_IN 'European Economic Area')
Service.type = 'object storage' AND Service LOCATED_IN ALL('France', 'Germany')

Lastly, the source code is released under the BSD-3 license and we expose OpenAPI, JSONSchema and JSON-LD specifications to facilitate interoperability and re-usability.

To conclude, we are proud to see that some of this work already landed in the GAIA-X Technical Architecture Paper and that we successfully managed to work together on a shared vision, as a European ecosystem.

Cloud solution architect - Data, AI and Grid computing