Middleware Development: Building a Unified Data Platform


Author: Maria Tassi, Nikos Gkevrekis

3rd July 2025


The CORE Innovation Centre technical team has released the alpha version of middleware platform that provides a unified solution for managing heterogeneous data flows across different data sources.

It is a flexible and scalable platform, which offers a robust foundation for seamless data ingestion, storage, processing, and secure access across diverse systems and demonstration sites. The development of this middleware platform is in line with CORE’s digital transformation mission, helping organisations accelerate their transition through cutting-edge research and technology development which addresses real-world barriers that hinder progress for many manufacturers, regardless their specific industry.

Architecture Overview


The middleware is structured around a layered architecture (see the figure below), which consists of four primary layers – Ingestion, Storage, Processing, and Consumption – all supported by Orchestration and Monitoring layers. These interconnected components ensure that the platform can handle a wide spectrum of data types while maintaining operational coherence and traceability.

Middleware architecture


Key features of the architecture

Multi-Source Data Ingestion: Designed to integrate heterogeneous data streams, the ingestion layer supports:

  • MQTT for real-time data

  • REST API for batched real time and historical data

  • File uploads (e.g. images, GIS) through a fileserver 

It also supports ETL Extract, Transform, Load processes and performs data validation on entry to maintain data quality and consistency.

Versatile Storage: The storage layer is optimized for various data types:

  • large files

  • structured data

  • time-series data

Features like pagination and sorting enhance performance, especially for large-scale datasets.

Secure and Dynamic Data Access: The consumption layer exposes data via RESTful APIs, featuring:

  • Token-based authentication

  • Role-based authorisation

This way, users can query real time, historical records and batched data that we ingested from real time sources, specify time ranges, and retrieve files in original or compressed formats. The system also supports dynamic endpoints tailored to specific organizations or devices.

 Interoperability and Integration: The platform is built to work across multiple sources and demonstration sites

Scalability and Extensibility: As an alpha release, the architecture anticipates future enhancements including real-time processing, advanced analytics modules, and tighter integration with external systems, supporting the evolving needs of diverse pilot sites.


Ingestion Layer

The development process began with the ingestion layer, which serves as the gateway for all incoming data. Designed with flexibility, this layer can receive data from real-time sources such as MQTT, scheduled or historical data via APIs, and large files like images and geospatial datasets through a fileserver. This fileserver was developed to support large document handling, enabling users to upload, download, and manage files in their original formats, in order to accommodate diverse data requirements from real-time data to large-scale datasets. In addition to managing data intake, the ingestion layer plays a key role in validating incoming information and preparing it for further use. It supports ETL operations, which ensure that data is harmonised, transformed when necessary, and made ready for further analysis or storage.


Storage Layer

In parallel with the ingestion layer, significant progress was made on the storage layer. This layer is designed to efficiently store the wide variety of data collected by the system. It integrates multiple storage technologies like S3 buckets, PostgreSQL, TimescaleDB etc. for general data storage,  for handling files from the fileserver, and  for managing time-series data ensuring optimal performance and scalability.


Consumption Layer

Development has also begun on the consumption layer, which is responsible for enabling secure access to the stored data. This layer currently provides REST API that are protected by token-based authentication and role-based authorization, ensuring that only authorized users can access sensitive information. Users can query bached real time data, historical data by requesting specific time ranges, and retrieve files either in their original form or in compressed formats and define pagination and sorting which enhance the speed and efficiency of data retrieval. Additionally, the consumption layer supports dynamic endpoint creation based on organizational structures or specific device IDs, allowing it to adapt easily to the varying needs of different demo sites and stakeholders.

Responding to real-life challenges


The system's complexity presented various challenges during development. Managing a variety of input formats, including real-time IoT data, historical API feeds, and large unstructured files, required the development of a flexible and adaptable ingestion system which can process heterogenous types of data. Ensuring data quality across many formats and sources necessitated the development of robust ETL methods as well as versatile and dynamic schema validations.

Another challenge was securyity, as designing a secure system with token-based authentication and role-based authorisation presented difficulties in multi-site, multi-user scenarios. To balance flexibility with system performance, especially for large-scale time-series data and file management, storage solutions have to be carefully selected and configured.

Furthermore, developing and maintaining dynamic endpoints able to consume data as they are being ingested required a careful and complicated database schema and management. Last but not least, deploying, managing and scaling numerous different data ingestion and consumption services requires the development and usage of complex custom orchestrating and monitoring tools.

Conclusions


The release of alpha version of the middleware platform marks a significant step toward a flexible and robust solution for managing heterogeneous data. With its layered architecture, it supports seamless data ingestion from real-time data flows, APIs, and large files, while ensuring efficient storage, validation, and secure access. Features such as ETL processing, dynamic endpoints, and multilevel authentication enable adaptability, interoperability, and data integrity across diverse sources.

The middleware platform has a substantial market impact, because it enables  interoperable data exchange across several sources, boosting collaboration in various fields such as manufacturing, climate resilience, and industrial processes. This interoperability accelerates digital transformation by combining real-time, historical, and large-format data to create a secure, scalable infrastructure that improves decision-making and operational efficiency.

Designed to manage complex, multi-site systems, it provides dynamic endpoints and role-based access while establishing the groundwork for future features such as real-time analytics and AI integration.

Elements of a secure and interoperable middleware approach have been explored and developed within two of our Horizon EU projects; CARDIMED, which focuses on boosting Mediterranean climate resilience, and MASTERMINE, which focuses on building a digitalized copy of real-world mines through an Industrial Metaverse approach.

The newly-released middleware platform is aligned with our CORE mission of accelerating digital transformation through cutting-edge research and technology development, especially in data interoperability, artificial intelligence, and industrial digitization and demonstrates our dedication to developing smart, adaptive, and future-ready solutions that address real-world difficulties across industries.

 

The alpha version lays a strong foundation for future enhancements, including advanced analytics, real-time processing, and broader system integration, positioning the middleware as a key enabler in modern data ecosystems.

 
 
Next
Next

Data-Driven Digital Shadows for Process Manufacturing