Enhancing precision in large-scale data analysis: an innovative robust imputation algorithm for managing outliers and missing values

Templ, Matthias

Enhancing precision in large-scale data analysis: an innovative robust imputation algorithm for managing outliers and missing values

dc.contributor.author	Templ, Matthias
dc.date.accessioned	2024-05-28T08:52:46Z
dc.date.available	2024-05-28T08:52:46Z
dc.date.issued	2023
dc.description.abstract	Navigating the intricate world of data analytics, one method has emerged as a key tool in confronting missing data: multiple imputation. Its strength is further fortified by its powerful variant, robust imputation, which enhances the precision and reliability of its results. In the challenging landscape of data analysis, non-robust methods can be swayed by a few extreme outliers, leading to skewed imputations and biased estimates. This can apply to both representative outliers – those true yet unusual values of your population – and non-representative outliers, which are mere measurement errors. Detecting these outliers in large or high-dimensional data sets often becomes as complex as unraveling a Gordian knot. The solution? Turn to robust imputation methods. Robust (imputation) methods effectively manage outliers and exhibit remarkable resistance to their influence, providing a more reliable approach to dealing with missing data. Moreover, these robust methods offer flexibility, accommodating even if the imputation model used is not a perfect fit. They are akin to a well-designed buffer system, absorbing slight deviations without compromising overall stability. In the latest advancement of statistical methodology, a new robust imputation algorithm has been introduced. This innovative solution addresses three significant challenges with robustness. It utilizes robust bootstrapping to manage model uncertainty during the imputation of a random sample; it incorporates robust fitting to reinforce accuracy; and it takes into account imputation uncertainty in a resilient manner. Furthermore, any complex regression or classification model for any variable with missing data can be run through the algorithm. With this new algorithm, we move one step closer to optimizing the accuracy and reliability of handling missing data. Using a realistic data set and a simulation study including a sensitivity analysis, the new alogorithm imputeRobust shows excellent performance compared with other common methods. Effectiveness was demonstrated by measures of precision for the prediction error, the coverage rates, and the mean square errors of the estimators, as well as by visual comparisons.
dc.identifier.doi	10.3390/math11122729
dc.identifier.uri	https://irf.fhnw.ch/handle/11654/43478
dc.identifier.uri	https://doi.org/10.26041/fhnw-7443
dc.issue	12
dc.language.iso	en
dc.publisher	MDPI
dc.relation.ispartof	Mathematics
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.spatial	Basel
dc.subject.ddc	330 - Wirtschaft
dc.title	Enhancing precision in large-scale data analysis: an innovative robust imputation algorithm for managing outliers and missing values
dc.type	01A - Beitrag in wissenschaftlicher Zeitschrift
dc.volume	11
dspace.entity.type	Publication
fhnw.InventedHere	Yes
fhnw.ReviewType	Anonymous ex ante peer review of a complete publication
fhnw.affiliation.hochschule	Hochschule für Wirtschaft	de_CH
fhnw.affiliation.institut	Institut für Unternehmensführung	de_CH
fhnw.openAccessCategory	Gold
fhnw.publicationState	Published
relation.isAuthorOfPublication	8b0a85e1-60d7-48f9-8551-419197a127e7
relation.isAuthorOfPublication.latestForDiscovery	8b0a85e1-60d7-48f9-8551-419197a127e7

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1

Name:: Templ_2023_Enhancing_precision_in_large-scale_data_analysis.pdf
Größe:: 852.19 KB
Format:: Adobe Portable Document Format
Beschreibung:

Herunterladen

Lizenzbündel

Gerade angezeigt 1 - 1 von 1

Name:: license.txt
Größe:: 1.36 KB
Format:: Item-specific license agreed upon to submission
Beschreibung:

Herunterladen

Sammlung

Institut für Unternehmensführung