Listen
6 Ergebnisse
Ergebnisse nach Hochschule und Institut
Publikation Statistical analysis of chemical element compositions in food science: problems and possibilities(MDPI, 2022) Templ, Matthias; Templ, BarbaraIn recent years, many analyses have been carried out to investigate the chemical components of food data. However, studies rarely consider the compositional pitfalls of such analyses. This is problematic as it may lead to arbitrary results when non-compositional statistical analysis is applied to compositional datasets. In this study, compositional data analysis (CoDa), which is widely used in other research fields, is compared with classical statistical analysis to demonstrate how the results vary depending on the approach and to show the best possible statistical analysis. For example, honey and saffron are highly susceptible to adulteration and imitation, so the determination of their chemical elements requires the best possible statistical analysis. Our study demonstrated how principle component analysis (PCA) and classification results are influenced by the pre-processing steps conducted on the raw data, and the replacement strategies for missing values and non-detects. Furthermore, it demonstrated the differences in results when compositional and non-compositional methods were applied. Our results suggested that the outcome of the log-ratio analysis provided better separation between the pure and adulterated data and allowed for easier interpretability of the results and a higher accuracy of classification. Similarly, it showed that classification with artificial neural networks (ANNs) works poorly if the CoDa pre-processing steps are left out. From these results, we advise the application of CoDa methods for analyses of the chemical elements of food and for the characterization and authentication of food products.01A - Beitrag in wissenschaftlicher ZeitschriftPublikation Prof. Rudolf Dutter (1946-2023): Ein Nachruf(Austrian Statistical Society, 07/2023) Filzmoser, Peter; Templ, MatthiasDer ehemalige TU Wien Professor Rudolf Dutter verstarb am 5. Mai 2023 an den Folgen seiner langjährigen Diabetes-Erkrankung. Prof. Dutter war von 1997 bis 2003 Redakteur der Österreichischen Zeitschrift für Statistik (Austrian Journal of Statistics), und diese Tätigkeit hat er mit viel Engagement im Sinne der Österreichischen Statistischen Gesellschaft geleistet. Eine seiner Aktivitäten war die Einrichtung und der Betrieb einer Website für die Zeitschrift, die einen "Open Access" Zugriff auf die Artikel ermöglichte. Ein kurzer Nachruf in dieser Zeitschrift, auch als Information für die Mitglieder der Gesellschaft, scheint daher mehr als passend zu sein.01A - Beitrag in wissenschaftlicher ZeitschriftPublikation A systematic overview on methods to protect sensitive data provided for various analyses(Springer, 2022) Templ, Matthias; Sariyar, Murat01A - Beitrag in wissenschaftlicher ZeitschriftPublikation Enhancing precision in large-scale data analysis: an innovative robust imputation algorithm for managing outliers and missing values(MDPI, 2023) Templ, MatthiasNavigating the intricate world of data analytics, one method has emerged as a key tool in confronting missing data: multiple imputation. Its strength is further fortified by its powerful variant, robust imputation, which enhances the precision and reliability of its results. In the challenging landscape of data analysis, non-robust methods can be swayed by a few extreme outliers, leading to skewed imputations and biased estimates. This can apply to both representative outliers – those true yet unusual values of your population – and non-representative outliers, which are mere measurement errors. Detecting these outliers in large or high-dimensional data sets often becomes as complex as unraveling a Gordian knot. The solution? Turn to robust imputation methods. Robust (imputation) methods effectively manage outliers and exhibit remarkable resistance to their influence, providing a more reliable approach to dealing with missing data. Moreover, these robust methods offer flexibility, accommodating even if the imputation model used is not a perfect fit. They are akin to a well-designed buffer system, absorbing slight deviations without compromising overall stability. In the latest advancement of statistical methodology, a new robust imputation algorithm has been introduced. This innovative solution addresses three significant challenges with robustness. It utilizes robust bootstrapping to manage model uncertainty during the imputation of a random sample; it incorporates robust fitting to reinforce accuracy; and it takes into account imputation uncertainty in a resilient manner. Furthermore, any complex regression or classification model for any variable with missing data can be run through the algorithm. With this new algorithm, we move one step closer to optimizing the accuracy and reliability of handling missing data. Using a realistic data set and a simulation study including a sensitivity analysis, the new alogorithm imputeRobust shows excellent performance compared with other common methods. Effectiveness was demonstrated by measures of precision for the prediction error, the coverage rates, and the mean square errors of the estimators, as well as by visual comparisons.01A - Beitrag in wissenschaftlicher ZeitschriftPublikation Comparison of zero replacement strategies for compositional data with large numbers of zeros(Elsevier, 2021) Lubbe, Sugnet; Filzmoser, Peter; Templ, Matthias01A - Beitrag in wissenschaftlicher ZeitschriftPublikation Privacy of study participants in open-access health and demographic surveillance system data. Requirements analysis for data anonymization(JMIR Publications, 2022) Templ, Matthias; Kanjala, Chifundo; Siems, InkenBackground Data anonymization and sharing have become popular topics for individuals, organizations, and countries worldwide. Open-access sharing of anonymized data containing sensitive information about individuals makes the most sense whenever the utility of the data can be preserved and the risk of disclosure can be kept below acceptable levels. In this case, researchers can use the data without access restrictions and limitations. Objective This study aimed to highlight the requirements and possible solutions for sharing health surveillance event history data. The challenges lie in the anonymization of multiple event dates and time-varying variables. Methods A sequential approach that adds noise to event dates is proposed. This approach maintains the event order and preserves the average time between events. In addition, a nosy neighbor distance-based matching approach to estimate the risk is proposed. Regarding the key variables that change over time, such as educational level or occupation, we make 2 proposals: one based on limiting the intermediate statuses of the individual and the other to achieve k-anonymity in subsets of the data. The proposed approaches were applied to the Karonga health and demographic surveillance system (HDSS) core residency data set, which contains longitudinal data from 1995 to the end of 2016 and includes 280,381 events with time-varying socioeconomic variables and demographic information. Results An anonymized version of the event history data, including longitudinal information on individuals over time, with high data utility, was created. Conclusions The proposed anonymization of event history data comprising static and time-varying variables applied to HDSS data led to acceptable disclosure risk, preserved utility, and being sharable as public use data. It was found that high utility was achieved, even with the highest level of noise added to the core event dates. The details are important to ensure consistency or credibility. Importantly, the sequential noise addition approach presented in this study does not only maintain the event order recorded in the original data but also maintains the time between events. We proposed an approach that preserves the data utility well but limits the number of response categories for the time-varying variables. Furthermore, using distance-based neighborhood matching, we simulated an attack under a nosy neighbor situation and by using a worst-case scenario where attackers have full information on the original data. We showed that the disclosure risk is very low, even when assuming that the attacker’s database and information are optimal. The HDSS and medical science research communities in low- and middle-income country settings will be the primary beneficiaries of the results and methods presented in this paper; however, the results will be useful for anyone working on anonymizing longitudinal event history data with time-varying variables for the purposes of sharing.01A - Beitrag in wissenschaftlicher Zeitschrift