Evaluation of synthetic data generators on complex tabular data
dc.contributor.author | Thees, Oscar | |
dc.contributor.author | Novak, Jiri | |
dc.contributor.author | Templ, Matthias | |
dc.contributor.editor | Domingo-Ferrer, Josep | |
dc.contributor.editor | Önen, Melek | |
dc.date.accessioned | 2024-12-12T07:42:51Z | |
dc.date.issued | 2024 | |
dc.description.abstract | Synthetic data generators are widely utilized to produce synthetic data, serving as a complement or replacement for real data. However, the utility of data is often limited by its complexity. The aim of this paper is to show their performance using a complex data set that includes cluster structures and complex relationships. We compare different synthesizers such as synthpop, Synthetic Data Vault, simPop, Mostly AI, Gretel, Realtabformer, and arf, taking into account their different methodologies with (mostly) default settings, on two properties: syntactical accuracy and statistical accuracy. As a complex and popular data set, we used the European Statistics on Income and Living Conditions data set. Almost all synthesizers resulted in low data utility and low syntactical accuracy. The results indicated that for such complex data, simPop, a computational and methodological framework for simulating complex data based on conditional modeling, emerged as the most effective approach for static tabular data and is superior compared to other conditional or joint modelling approaches. | |
dc.event | International Conference, PSD 2024 | |
dc.event.end | 2024-09-27 | |
dc.event.start | 2024-09-25 | |
dc.identifier.doi | 10.1007/978-3-031-69651-0_13 | |
dc.identifier.isbn | 978-3-031-69650-3 | |
dc.identifier.isbn | 978-3-031-69651-0 | |
dc.identifier.uri | https://irf.fhnw.ch/handle/11654/48405 | |
dc.language.iso | en | |
dc.publisher | Springer | |
dc.relation.ispartof | Privacy in statistical databases. International Conference, PSD 2024, Antibes Juan-les-Pins, France, September 25–27, 2024, Proceedings | |
dc.relation.ispartofseries | Lecture Notes in Computer Science | |
dc.spatial | Cham | |
dc.subject.ddc | 330 - Wirtschaft | |
dc.subject.ddc | 004 - Computer Wissenschaften, Internet | |
dc.subject.ddc | 510 - Mathematik | |
dc.title | Evaluation of synthetic data generators on complex tabular data | |
dc.type | 04B - Beitrag Konferenzschrift | |
dspace.entity.type | Publication | |
fhnw.InventedHere | Yes | |
fhnw.ReviewType | Anonymous ex ante peer review of a complete publication | |
fhnw.affiliation.hochschule | Hochschule für Wirtschaft FHNW | de_CH |
fhnw.affiliation.institut | Institut für Unternehmensführung | de_CH |
fhnw.openAccessCategory | Closed | |
fhnw.pagination | 194-209 | |
fhnw.publicationState | Published | |
fhnw.seriesNumber | 14915 | |
relation.isAuthorOfPublication | 3104c833-a4cf-4b32-8e0d-7dfe3357cb85 | |
relation.isAuthorOfPublication | 8cf8820c-c0f2-419e-95c3-d3042afce66d | |
relation.isAuthorOfPublication | 8b0a85e1-60d7-48f9-8551-419197a127e7 | |
relation.isAuthorOfPublication.latestForDiscovery | 3104c833-a4cf-4b32-8e0d-7dfe3357cb85 |
Dateien
Lizenzbündel
1 - 1 von 1
Kein Vorschaubild vorhanden
- Name:
- license.txt
- Größe:
- 2.66 KB
- Format:
- Item-specific license agreed upon to submission
- Beschreibung: