Templ, Matthias

Lade...
Profilbild
E-Mail-Adresse
Geburtsdatum
Projekt
Organisationseinheiten
Berufsbeschreibung
Nachname
Templ
Vorname
Matthias
Name
Matthias Templ

Suchergebnisse

Gerade angezeigt 1 - 10 von 39
  • Publikation
    Evaluation of synthetic data generators on complex tabular data
    (Springer, 2024) Thees, Oscar; Novak, Jiri; Templ, Matthias; Domingo-Ferrer, Josep; Önen, Melek
    Synthetic data generators are widely utilized to produce synthetic data, serving as a complement or replacement for real data. However, the utility of data is often limited by its complexity. The aim of this paper is to show their performance using a complex data set that includes cluster structures and complex relationships. We compare different synthesizers such as synthpop, Synthetic Data Vault, simPop, Mostly AI, Gretel, Realtabformer, and arf, taking into account their different methodologies with (mostly) default settings, on two properties: syntactical accuracy and statistical accuracy. As a complex and popular data set, we used the European Statistics on Income and Living Conditions data set. Almost all synthesizers resulted in low data utility and low syntactical accuracy. The results indicated that for such complex data, simPop, a computational and methodological framework for simulating complex data based on conditional modeling, emerged as the most effective approach for static tabular data and is superior compared to other conditional or joint modelling approaches.
    04B - Beitrag Konferenzschrift
  • Vorschaubild
    Publikation
    Simulation of calibrated complex synthetic population data with XGBoost
    (MDPI, 2024) Gussenbauer, Johannes; Templ, Matthias; Fritzmann, Siro; Kowarik, Alexander
    Synthetic data generation methods are used to transform the original data into privacy-compliant synthetic copies (twin data). With our proposed approach, synthetic data can be simulated in the same size as the input data or in any size, and in the case of finite populations, even the entire population can be simulated. The proposed XGBoost-based method is compared with known model-based approaches to generate synthetic data using a complex survey data set. The XGBoost method shows strong performance, especially with synthetic categorical variables, and outperforms other tested methods. Furthermore, the structure and relationship between variables are well preserved. The tuning of the parameters is performed automatically by a modified k-fold cross-validation. If exact population margins are known, e.g., cross-tabulated population counts on age class, gender and region, the synthetic data must be calibrated to those known population margins. For this purpose, we have implemented a simulated annealing algorithm that is able to use multiple population margins simultaneously to post-calibrate a synthetic population. The algorithm is, thus, able to calibrate simulated population data containing cluster and individual information, e.g., about persons in households, at both person and household level. Furthermore, the algorithm is efficiently implemented so that the adjustment of populations with many millions or more persons is possible.
    01A - Beitrag in wissenschaftlicher Zeitschrift
  • Vorschaubild
    Publikation
    Robust CoDA balances and the role of the variance in complex riverine geochemical systems
    (Elsevier, 2024) Gozzi, Caterina; Templ, Matthias; Buccianti, Antonella
    This study introduces a robust method for analyzing the geochemical behavior of chemical species in river catchment water. It focuses on isometric log-ratio coordinates obtained from a sequential partition method that successively maximizes the explained variance in the data set. Robust orthonormal coordinates are created based on hierarchical clustering and robust estimation of the variation matrix. Applying this to the water chemistry of Italy's Arno and Tiber basins, the research reveals the associations of variables in data structure and processes across varying geological and climatic conditions. The method uncovers key contrasting geochemical processes and suggests that the behavior of simple balances characterized by lower variances (i.e., and ) are mainly influenced by random fluctuations with no differences between classical or robust methods. However, when balances describe more complex geochemical processes resulting in frequency distributions affected by the presence of bimodality or outliers, significant differences among the two approaches emerge, compromising the data interpretation. The proposed metodology offers more insights into the investigation of catchment geochemistry's resilience to hydroclimatic changes, marking a significant step in understanding large-scale environmental dynamics.
    01A - Beitrag in wissenschaftlicher Zeitschrift
  • Vorschaubild
    Publikation
    Robust multipe imputation with GAM
    (Springer, 2024) Templ, Matthias
    Multiple imputation of missing values is a key step in data analytics and a standard process in data science. Nonlinear imputation methods come into play whenever the linear relationship between a response and predictors cannot be linearized by transformations of variables, adding interactions, or using, e.g., quadratic terms. Generalized additive models (GAM) and its extension, GAMLSS—where each parameter of the distribution, such as mean, variance, skewness, and kurtosis, can be represented as a function of predictors, are widely used nonlinear methods. However, non-robust methods such as standard GAM’s and GAMLSS’s can be swayed by outliers, leading to outlier-driven imputations. This can apply concerning both representative outliers—those true yet unusual values of your population—and non-representative outliers, which are mere measurement errors. Robust (imputation) methods effectively manage outliers and exhibit resistance to their influence, providing a more reliable approach to dealing with missing data. The innovative solution of the proposed new imputation algorithm tackles three major challenges related to robustness. (1) A robust bootstrap method is employed to handle model uncertainty during the imputation of a random sample. (2) The approach incorporates robust fitting techniques to enhance accuracy. (3) It effectively considers imputation uncertainty in a resilient manner. Furthermore, any complex model for any variable with missingness can be considered and run through the algorithm. For the real-world data sets used and the simulation study conducted, the novel algorithm imputeRobust which includes robust methods for imputation with GAM’s demonstrates superior performance compared to existing imputation methods using GAMLSS. Limitations pertain to the imputation of categorical variables using robust techniques.
    01A - Beitrag in wissenschaftlicher Zeitschrift
  • Vorschaubild
    Publikation
    Modeling bee hive dynamics. Assessing colony health using hive weight and environmental parameters
    (Elsevier, 2024) Degenfellner, Jürgen; Templ, Matthias
    Our state-of-the-art methods study hive weight and predict bee health and colony condition using advanced machine learning tools trained with unlabeled data. By integrating methodologies such as signal extraction, similar trend monitoring, principal component analysis, and MM-Regression, our goal is to translate hive weight fluctuations into predictive insights for future hive monitoring systems. In particular, signal extraction methods are used to obtain an interpretable signal and to detect level shifts, exploratory analysis is used to visually detect dissimilar weight trajectories from nearby hives, historical data are used to robustly predict hive weights, and to trigger an alarm when predictions and actual observations differ significantly. Our study shows how these methods can be successfully used to analyze and predict hive weights and underscore the need for future research to accumulate labeled data and to adopt a holistic perspective, incorporating a wider spectrum of influences of hive weight simultaneously.
    01A - Beitrag in wissenschaftlicher Zeitschrift
  • Vorschaubild
    Publikation
    The impact of misclassifications and outliers on imputation methods
    (Taylor & Francis, 2024) Templ, Matthias; Ulmer, Markus
    Many imputation methods have been developed over the years and tested mostly under ideal settings. Surprisingly, there is no detailed research on how imputation methods perform when the idealized assumptions about the distribution of data and/or model assumptions are partly not fulfilled. This research looks into the susceptibility of imputation techniques, particularly in relation to outliers, misclassifications, and incorrect model specifications. This is crucial knowledge about how well the methods convince in everyday life because, in reality, conditions are usually not ideal, and model assumptions may not hold. The data may not fit the defined models well. Outliers distort the estimates, and misclassifications reduce the quality of most imputation methods. Several different evaluation measures are discussed, from comparing imputed values with true values or comparing certain statistics, from the performance of classifiers to the variance of estimated parameters. Some well-known imputation methods are compared based on real data and simulations. It turns out that robust conditional imputation methods outperform other methods for real data and simulation settings.
    01A - Beitrag in wissenschaftlicher Zeitschrift
  • Vorschaubild
    Publikation
    Prof. Rudolf Dutter (1946-2023): Ein Nachruf
    (Austrian Statistical Society, 07/2023) Filzmoser, Peter; Templ, Matthias
    Der ehemalige TU Wien Professor Rudolf Dutter verstarb am 5. Mai 2023 an den Folgen seiner langjährigen Diabetes-Erkrankung. Prof. Dutter war von 1997 bis 2003 Redakteur der Österreichischen Zeitschrift für Statistik (Austrian Journal of Statistics), und diese Tätigkeit hat er mit viel Engagement im Sinne der Österreichischen Statistischen Gesellschaft geleistet. Eine seiner Aktivitäten war die Einrichtung und der Betrieb einer Website für die Zeitschrift, die einen "Open Access" Zugriff auf die Artikel ermöglichte. Ein kurzer Nachruf in dieser Zeitschrift, auch als Information für die Mitglieder der Gesellschaft, scheint daher mehr als passend zu sein.
    01A - Beitrag in wissenschaftlicher Zeitschrift
  • Publikation
    Visualization and imputation of missing values. With applications in R
    (Springer, 2023) Templ, Matthias
    This book explores visualization and imputation techniques for missing values and presents practical applications using the statistical software R. It explains the concepts of common imputation methods with a focus on visualization, description of data problems and practical solutions using R, including modern methods of robust imputation, imputation based on deep learning and imputation for complex data. By describing the advantages, disadvantages and pitfalls of each method, the book presents a clear picture of which imputation methods are applicable given a specific data set at hand. The material covered includes the pre-analysis of data, visualization of missing values in incomplete data, single and multiple imputation, deductive imputation and outlier replacement, model-based methods including methods based on robust estimates, non-linear methods such as tree-based and deep learning methods, imputation of compositional data, imputation quality evaluation from visual diagnostics to precision measures, coverage rates and prediction performance and a description of different model- and design-based simulation designs for the evaluation. The book also features a topic-focused introduction to R and R code is provided in each chapter to explain the practical application of the described methodology. Addressed to researchers, practitioners and students who work with incomplete data, the book offers an introduction to the subject as well as a discussion of recent developments in the field. It is suitable for beginners to the topic and advanced readers alike.
    02 - Monographie
  • Vorschaubild
    Publikation
    Enhancing precision in large-scale data analysis: an innovative robust imputation algorithm for managing outliers and missing values
    (MDPI, 2023) Templ, Matthias
    Navigating the intricate world of data analytics, one method has emerged as a key tool in confronting missing data: multiple imputation. Its strength is further fortified by its powerful variant, robust imputation, which enhances the precision and reliability of its results. In the challenging landscape of data analysis, non-robust methods can be swayed by a few extreme outliers, leading to skewed imputations and biased estimates. This can apply to both representative outliers – those true yet unusual values of your population – and non-representative outliers, which are mere measurement errors. Detecting these outliers in large or high-dimensional data sets often becomes as complex as unraveling a Gordian knot. The solution? Turn to robust imputation methods. Robust (imputation) methods effectively manage outliers and exhibit remarkable resistance to their influence, providing a more reliable approach to dealing with missing data. Moreover, these robust methods offer flexibility, accommodating even if the imputation model used is not a perfect fit. They are akin to a well-designed buffer system, absorbing slight deviations without compromising overall stability. In the latest advancement of statistical methodology, a new robust imputation algorithm has been introduced. This innovative solution addresses three significant challenges with robustness. It utilizes robust bootstrapping to manage model uncertainty during the imputation of a random sample; it incorporates robust fitting to reinforce accuracy; and it takes into account imputation uncertainty in a resilient manner. Furthermore, any complex regression or classification model for any variable with missing data can be run through the algorithm. With this new algorithm, we move one step closer to optimizing the accuracy and reliability of handling missing data. Using a realistic data set and a simulation study including a sensitivity analysis, the new alogorithm imputeRobust shows excellent performance compared with other common methods. Effectiveness was demonstrated by measures of precision for the prediction error, the coverage rates, and the mean square errors of the estimators, as well as by visual comparisons.
    01A - Beitrag in wissenschaftlicher Zeitschrift
  • Vorschaubild
    Publikation
    Statistical analysis of chemical element compositions in food science: problems and possibilities
    (MDPI, 2022) Templ, Matthias; Templ, Barbara
    In recent years, many analyses have been carried out to investigate the chemical components of food data. However, studies rarely consider the compositional pitfalls of such analyses. This is problematic as it may lead to arbitrary results when non-compositional statistical analysis is applied to compositional datasets. In this study, compositional data analysis (CoDa), which is widely used in other research fields, is compared with classical statistical analysis to demonstrate how the results vary depending on the approach and to show the best possible statistical analysis. For example, honey and saffron are highly susceptible to adulteration and imitation, so the determination of their chemical elements requires the best possible statistical analysis. Our study demonstrated how principle component analysis (PCA) and classification results are influenced by the pre-processing steps conducted on the raw data, and the replacement strategies for missing values and non-detects. Furthermore, it demonstrated the differences in results when compositional and non-compositional methods were applied. Our results suggested that the outcome of the log-ratio analysis provided better separation between the pure and adulterated data and allowed for easier interpretability of the results and a higher accuracy of classification. Similarly, it showed that classification with artificial neural networks (ANNs) works poorly if the CoDa pre-processing steps are left out. From these results, we advise the application of CoDa methods for analyses of the chemical elements of food and for the characterization and authentication of food products.
    01A - Beitrag in wissenschaftlicher Zeitschrift