Towards a sustainable astronomical data infrastructure. Optimising linking data from the Rucio datalake to the users areas within the SKA Regional Centres Network

Typ
01A - Beitrag in wissenschaftlicher Zeitschrift
Herausgeber:innen
Herausgeber:in (Körperschaft)
Betreuer:in
Übergeordnetes Werk
Open Research Europe
Themenheft
Link
Zugehörige Forschungsdaten
Reihe / Serie
Reihennummer
Jahrgang / Band
6
Ausgabe / Nummer
Seiten / Dauer
Patentnummer
Verlag / Herausgebende Institution
F1000 Research
Verlagsort / Veranstaltungsort
Auflage
Version
Programmiersprache
Abtretungsempfänger:in
Praxispartner:in/Auftraggeber:in
Zusammenfassung
The distributed architecture of the SKA Regional Centre Network (SRCNet) aims to provide scientific communities worldwide with efficient computational and storage resources to exploit the massive data volumes produced by the SKA Observatory (SKAO). Given the amount of SKAO data, traditional data management paradigms — where data is transferred to computational resources— are no longer feasible. Instead, computational workflows must increasingly be relocated closer to data storage locations, emphasizing efficient data access strategies and avoiding unnecessary duplication or redundancy. In this context, we present PrepareData, a modular and extensible data delivery service developed within SRCNet prototyping activities. Our proposal for this service addresses the critical challenge of redundant data transfers and duplication at both node and user levels by enabling seamless delivery of requested datasets from local Rucio Storage Elements (RSEs) directly into users’ working environments. PrepareData operates as a local service within each SRCNet node and it is integrated into a broader ecosystem of federated services. Specifically, we designed and evaluated two distinct yet complementary implementations to avoid unnecessary data duplication and to enable a dynamic data bridge between the RSEs and the user storage areas, through: (1) a filesystem-based solution leveraging CephFS, which uses shared filesystem mount points and bind mounts to ensure consistent and immediate data availability of the data across computational nodes, and (2) a Kubernetes model using Persistent Volumes and Persistent Volume Claims, dynamically injecting data into a user’s areas. To tackle this work we detail the architectural design and development, the technical implementation, the integration of both solutions with science enabling tools, such as JupyterHub, CARTA or virtually any application, and finally we provide a performance evaluation. This contribution provides a scalable and sustainable blueprint for data delivery in federated scientific infrastructures, supporting the broader goals of green computing and efficient resource utilisation.
Schlagwörter
Projekt
Veranstaltung
Startdatum der Ausstellung
Enddatum der Ausstellung
Startdatum der Konferenz
Enddatum der Konferenz
Datum der letzten Prüfung
ISBN
ISSN
2732-5121
Sprache
Englisch
Während FHNW Zugehörigkeit erstellt
Ja
Zukunftsfelder FHNW
Publikationsstatus
Veröffentlicht
Begutachtung
Peer-Review der ganzen Publikation
Open Access-Status
Diamond
Lizenz
'https://creativecommons.org/licenses/by/4.0/'
Zitation
Parra-Royón, M., Garrido-Sánchez, J., Sánchez-Expósito, S., Darriba-Pol, L., Sánchez-Castañeda, J., Mendoza, M. Á., Coles, J., McConkey, S., Joshi, R., Barnsley, R., Salgado, J., & Verdes-Montenegro, L. (2026). Towards a sustainable astronomical data infrastructure. Optimising linking data from the Rucio datalake to the users areas within the SKA Regional Centres Network. Open Research Europe, 6. https://doi.org/10.12688/openreseurope.22118.2