Templ, Matthias
Lade...
E-Mail-Adresse
Geburtsdatum
Projekt
Organisationseinheiten
Berufsbeschreibung
Nachname
Templ
Vorname
Matthias
Name
Matthias Templ
11 Ergebnisse
Suchergebnisse
Gerade angezeigt 1 - 10 von 11
Publikation Evaluation of synthetic data generators on complex tabular data(Springer, 2024) Thees, Oscar; Novak, Jiri; Templ, Matthias; Domingo-Ferrer, Josep; Önen, MelekSynthetic data generators are widely utilized to produce synthetic data, serving as a complement or replacement for real data. However, the utility of data is often limited by its complexity. The aim of this paper is to show their performance using a complex data set that includes cluster structures and complex relationships. We compare different synthesizers such as synthpop, Synthetic Data Vault, simPop, Mostly AI, Gretel, Realtabformer, and arf, taking into account their different methodologies with (mostly) default settings, on two properties: syntactical accuracy and statistical accuracy. As a complex and popular data set, we used the European Statistics on Income and Living Conditions data set. Almost all synthesizers resulted in low data utility and low syntactical accuracy. The results indicated that for such complex data, simPop, a computational and methodological framework for simulating complex data based on conditional modeling, emerged as the most effective approach for static tabular data and is superior compared to other conditional or joint modelling approaches.04B - Beitrag KonferenzschriftPublikation Prof. Rudolf Dutter (1946-2023): Ein Nachruf(Austrian Statistical Society, 07/2023) Filzmoser, Peter; Templ, MatthiasDer ehemalige TU Wien Professor Rudolf Dutter verstarb am 5. Mai 2023 an den Folgen seiner langjährigen Diabetes-Erkrankung. Prof. Dutter war von 1997 bis 2003 Redakteur der Österreichischen Zeitschrift für Statistik (Austrian Journal of Statistics), und diese Tätigkeit hat er mit viel Engagement im Sinne der Österreichischen Statistischen Gesellschaft geleistet. Eine seiner Aktivitäten war die Einrichtung und der Betrieb einer Website für die Zeitschrift, die einen "Open Access" Zugriff auf die Artikel ermöglichte. Ein kurzer Nachruf in dieser Zeitschrift, auch als Information für die Mitglieder der Gesellschaft, scheint daher mehr als passend zu sein.01A - Beitrag in wissenschaftlicher ZeitschriftPublikation Enhancing precision in large-scale data analysis: an innovative robust imputation algorithm for managing outliers and missing values(MDPI, 2023) Templ, MatthiasNavigating the intricate world of data analytics, one method has emerged as a key tool in confronting missing data: multiple imputation. Its strength is further fortified by its powerful variant, robust imputation, which enhances the precision and reliability of its results. In the challenging landscape of data analysis, non-robust methods can be swayed by a few extreme outliers, leading to skewed imputations and biased estimates. This can apply to both representative outliers – those true yet unusual values of your population – and non-representative outliers, which are mere measurement errors. Detecting these outliers in large or high-dimensional data sets often becomes as complex as unraveling a Gordian knot. The solution? Turn to robust imputation methods. Robust (imputation) methods effectively manage outliers and exhibit remarkable resistance to their influence, providing a more reliable approach to dealing with missing data. Moreover, these robust methods offer flexibility, accommodating even if the imputation model used is not a perfect fit. They are akin to a well-designed buffer system, absorbing slight deviations without compromising overall stability. In the latest advancement of statistical methodology, a new robust imputation algorithm has been introduced. This innovative solution addresses three significant challenges with robustness. It utilizes robust bootstrapping to manage model uncertainty during the imputation of a random sample; it incorporates robust fitting to reinforce accuracy; and it takes into account imputation uncertainty in a resilient manner. Furthermore, any complex regression or classification model for any variable with missing data can be run through the algorithm. With this new algorithm, we move one step closer to optimizing the accuracy and reliability of handling missing data. Using a realistic data set and a simulation study including a sensitivity analysis, the new alogorithm imputeRobust shows excellent performance compared with other common methods. Effectiveness was demonstrated by measures of precision for the prediction error, the coverage rates, and the mean square errors of the estimators, as well as by visual comparisons.01A - Beitrag in wissenschaftlicher ZeitschriftPublikation Privacy of study participants in open-access health and demographic surveillance system data. Requirements analysis for data anonymization(JMIR Publications, 2022) Templ, Matthias; Kanjala, Chifundo; Siems, InkenBackground Data anonymization and sharing have become popular topics for individuals, organizations, and countries worldwide. Open-access sharing of anonymized data containing sensitive information about individuals makes the most sense whenever the utility of the data can be preserved and the risk of disclosure can be kept below acceptable levels. In this case, researchers can use the data without access restrictions and limitations. Objective This study aimed to highlight the requirements and possible solutions for sharing health surveillance event history data. The challenges lie in the anonymization of multiple event dates and time-varying variables. Methods A sequential approach that adds noise to event dates is proposed. This approach maintains the event order and preserves the average time between events. In addition, a nosy neighbor distance-based matching approach to estimate the risk is proposed. Regarding the key variables that change over time, such as educational level or occupation, we make 2 proposals: one based on limiting the intermediate statuses of the individual and the other to achieve k-anonymity in subsets of the data. The proposed approaches were applied to the Karonga health and demographic surveillance system (HDSS) core residency data set, which contains longitudinal data from 1995 to the end of 2016 and includes 280,381 events with time-varying socioeconomic variables and demographic information. Results An anonymized version of the event history data, including longitudinal information on individuals over time, with high data utility, was created. Conclusions The proposed anonymization of event history data comprising static and time-varying variables applied to HDSS data led to acceptable disclosure risk, preserved utility, and being sharable as public use data. It was found that high utility was achieved, even with the highest level of noise added to the core event dates. The details are important to ensure consistency or credibility. Importantly, the sequential noise addition approach presented in this study does not only maintain the event order recorded in the original data but also maintains the time between events. We proposed an approach that preserves the data utility well but limits the number of response categories for the time-varying variables. Furthermore, using distance-based neighborhood matching, we simulated an attack under a nosy neighbor situation and by using a worst-case scenario where attackers have full information on the original data. We showed that the disclosure risk is very low, even when assuming that the attacker’s database and information are optimal. The HDSS and medical science research communities in low- and middle-income country settings will be the primary beneficiaries of the results and methods presented in this paper; however, the results will be useful for anyone working on anonymizing longitudinal event history data with time-varying variables for the purposes of sharing.01A - Beitrag in wissenschaftlicher ZeitschriftPublikation Statistical analysis of chemical element compositions in food science: problems and possibilities(MDPI, 2022) Templ, Matthias; Templ, BarbaraIn recent years, many analyses have been carried out to investigate the chemical components of food data. However, studies rarely consider the compositional pitfalls of such analyses. This is problematic as it may lead to arbitrary results when non-compositional statistical analysis is applied to compositional datasets. In this study, compositional data analysis (CoDa), which is widely used in other research fields, is compared with classical statistical analysis to demonstrate how the results vary depending on the approach and to show the best possible statistical analysis. For example, honey and saffron are highly susceptible to adulteration and imitation, so the determination of their chemical elements requires the best possible statistical analysis. Our study demonstrated how principle component analysis (PCA) and classification results are influenced by the pre-processing steps conducted on the raw data, and the replacement strategies for missing values and non-detects. Furthermore, it demonstrated the differences in results when compositional and non-compositional methods were applied. Our results suggested that the outcome of the log-ratio analysis provided better separation between the pure and adulterated data and allowed for easier interpretability of the results and a higher accuracy of classification. Similarly, it showed that classification with artificial neural networks (ANNs) works poorly if the CoDa pre-processing steps are left out. From these results, we advise the application of CoDa methods for analyses of the chemical elements of food and for the characterization and authentication of food products.01A - Beitrag in wissenschaftlicher ZeitschriftPublikation A new version of the Langelier-Ludwig square diagram under a compositional perspective(Elsevier, 2022) Templ, Matthias; Gozzi, Caterina; Buccianti, Antonella01A - Beitrag in wissenschaftlicher ZeitschriftPublikation A systematic overview on methods to protect sensitive data provided for various analyses(Springer, 2022) Templ, Matthias; Sariyar, Murat01A - Beitrag in wissenschaftlicher ZeitschriftPublikation Artificial neural networks to impute rounded zeros in compositional data(Springer, 2021) Templ, Matthias; Filzmoser, Peter; Hron, Karel; Martín-Fernández, Josep Antoni; Palarea-Albaladejo, Javier04A - Beitrag SammelbandPublikation Coincidence of temperature extremes and phenological events of grapevines(Institut des Sciences de la Vigne et du Vin (I S V V), 2021) Templ, Barbara; Templ, Matthias; Barbieri, Roberto; Meier, Michael; Zufferey, VivianA growing number of studies have highlighted the consequences of climate change on agriculture, including the impacts of climate extremes such as drought, heat waves and frost. The aim of this study was to assess the influence of temperature extremes on various phenological events of grapevine varieties in Southwest Switzerland (Leytron, Canton of Valais). We aimed to capture the occurrence of extreme events in specific years in various grapevine varieties and at different phenological phases to rank the varieties based on their sensitivity to temperature extremes and thus quantify their robustness. Phenological observations (1978–2018) of six Vitis vinifera varieties (Arvine, Chardonnay, Chasselas, Gamay, Pinot noir, and Syrah) were subjected to event coincidence analysis. Extreme events were defined as values in the uppermost or lowermost percentiles of the timing of the phenophases and daily temperatures within a 30-day window before the phenophase event occurred. Significantly more extreme temperature and phenological events occurred in Leytron between 2003 and 2017 than in the earlier years, with the years 2007, 2011, 2014 and 2017 being remarkable in terms of the number of extreme coincidence events. Moreover, bud development and flowering experienced significantly more extreme coincidence events than other phenophases; however, the occurrence rate of extreme coincidence events was independent of the phenophase. Based on the total number of extreme events, the varieties did not differ in their responses to temperature extremes. Therefore, event coincidence analysis is an appropriate tool to quantify the occurrence of extreme events. The occurrence of extreme temperature events clearly affected the advancement of the timings of phenological events in various grapevines. However, there were no varietal differences in terms of response to extreme temperatures; thus, additional research is warranted to outline the best adaptation measures.01A - Beitrag in wissenschaftlicher ZeitschriftPublikation Comparison of zero replacement strategies for compositional data with large numbers of zeros(Elsevier, 2021) Lubbe, Sugnet; Filzmoser, Peter; Templ, Matthias01A - Beitrag in wissenschaftlicher Zeitschrift