Towards a sustainable astronomical data infrastructure. Optimising linking data from the Rucio datalake to the users areas within the SKA Regional Centres Network

Type
01A - Journal article
Editors
Editor (Corporation)
Supervisor
Parent work
Open Research Europe
Special issue
DOI of the original publication
Link
Related research data
Series
Series number
Volume
6
Issue / Number
Pages / Duration
Patent number
Publisher / Publishing institution
F1000 Research
Place of publication / Event location
Edition
Version
Programming language
Assignee
Practice partner / Client
Abstract
The distributed architecture of the SKA Regional Centre Network (SRCNet) aims to provide scientific communities worldwide with efficient computational and storage resources to exploit the massive data volumes produced by the SKA Observatory (SKAO). Given the amount of SKAO data, traditional data management paradigms — where data is transferred to computational resources— are no longer feasible. Instead, computational workflows must increasingly be relocated closer to data storage locations, emphasizing efficient data access strategies and avoiding unnecessary duplication or redundancy. In this context, we present PrepareData, a modular and extensible data delivery service developed within SRCNet prototyping activities. Our proposal for this service addresses the critical challenge of redundant data transfers and duplication at both node and user levels by enabling seamless delivery of requested datasets from local Rucio Storage Elements (RSEs) directly into users’ working environments. PrepareData operates as a local service within each SRCNet node and it is integrated into a broader ecosystem of federated services. Specifically, we designed and evaluated two distinct yet complementary implementations to avoid unnecessary data duplication and to enable a dynamic data bridge between the RSEs and the user storage areas, through: (1) a filesystem-based solution leveraging CephFS, which uses shared filesystem mount points and bind mounts to ensure consistent and immediate data availability of the data across computational nodes, and (2) a Kubernetes model using Persistent Volumes and Persistent Volume Claims, dynamically injecting data into a user’s areas. To tackle this work we detail the architectural design and development, the technical implementation, the integration of both solutions with science enabling tools, such as JupyterHub, CARTA or virtually any application, and finally we provide a performance evaluation. This contribution provides a scalable and sustainable blueprint for data delivery in federated scientific infrastructures, supporting the broader goals of green computing and efficient resource utilisation.
Keywords
Project
Event
Exhibition start date
Exhibition end date
Conference start date
Conference end date
Date of the last check
ISBN
ISSN
2732-5121
Language
English
Created during FHNW affiliation
Yes
Strategic action fields FHNW
Publication status
Published
Review
peer-reviewed
Open access category
Diamond
License
'https://creativecommons.org/licenses/by/4.0/'
Citation
Parra-Royón, M., Garrido-Sánchez, J., Sánchez-Expósito, S., Darriba-Pol, L., Sánchez-Castañeda, J., Mendoza, M. Á., Coles, J., McConkey, S., Joshi, R., Barnsley, R., Salgado, J., & Verdes-Montenegro, L. (2026). Towards a sustainable astronomical data infrastructure. Optimising linking data from the Rucio datalake to the users areas within the SKA Regional Centres Network. Open Research Europe, 6. https://doi.org/10.12688/openreseurope.22118.2