Evaluation of synthetic data generators on complex tabular data

dc.contributor.authorThees, Oscar
dc.contributor.authorNovak, Jiri
dc.contributor.authorTempl, Matthias
dc.contributor.editorDomingo-Ferrer, Josep
dc.contributor.editorÖnen, Melek
dc.date.accessioned2024-12-12T07:42:51Z
dc.date.issued2024
dc.description.abstractSynthetic data generators are widely utilized to produce synthetic data, serving as a complement or replacement for real data. However, the utility of data is often limited by its complexity. The aim of this paper is to show their performance using a complex data set that includes cluster structures and complex relationships. We compare different synthesizers such as synthpop, Synthetic Data Vault, simPop, Mostly AI, Gretel, Realtabformer, and arf, taking into account their different methodologies with (mostly) default settings, on two properties: syntactical accuracy and statistical accuracy. As a complex and popular data set, we used the European Statistics on Income and Living Conditions data set. Almost all synthesizers resulted in low data utility and low syntactical accuracy. The results indicated that for such complex data, simPop, a computational and methodological framework for simulating complex data based on conditional modeling, emerged as the most effective approach for static tabular data and is superior compared to other conditional or joint modelling approaches.
dc.eventInternational Conference, PSD 2024
dc.event.end2024-09-27
dc.event.start2024-09-25
dc.identifier.doi10.1007/978-3-031-69651-0_13
dc.identifier.isbn978-3-031-69650-3
dc.identifier.isbn978-3-031-69651-0
dc.identifier.urihttps://irf.fhnw.ch/handle/11654/48405
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofPrivacy in statistical databases. International Conference, PSD 2024, Antibes Juan-les-Pins, France, September 25–27, 2024, Proceedings
dc.relation.ispartofseriesLecture Notes in Computer Science
dc.spatialCham
dc.subject.ddc330 - Wirtschaft
dc.subject.ddc004 - Computer Wissenschaften, Internet
dc.subject.ddc510 - Mathematik
dc.titleEvaluation of synthetic data generators on complex tabular data
dc.type04B - Beitrag Konferenzschrift
dspace.entity.typePublication
fhnw.InventedHereYes
fhnw.ReviewTypeAnonymous ex ante peer review of a complete publication
fhnw.affiliation.hochschuleHochschule für Wirtschaft FHNWde_CH
fhnw.affiliation.institutInstitut für Unternehmensführungde_CH
fhnw.openAccessCategoryClosed
fhnw.pagination194-209
fhnw.publicationStatePublished
fhnw.seriesNumber14915
relation.isAuthorOfPublication3104c833-a4cf-4b32-8e0d-7dfe3357cb85
relation.isAuthorOfPublication8cf8820c-c0f2-419e-95c3-d3042afce66d
relation.isAuthorOfPublication8b0a85e1-60d7-48f9-8551-419197a127e7
relation.isAuthorOfPublication.latestForDiscovery3104c833-a4cf-4b32-8e0d-7dfe3357cb85
Dateien

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Kein Vorschaubild vorhanden
Name:
license.txt
Größe:
2.66 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: