Evaluation of robust outlier detection methods for zero-inflated complex data

dc.contributor.authorTempl, Matthias
dc.contributor.authorGussenbauer, Johannes
dc.contributor.authorFilzmoser, Peter
dc.date.accessioned2025-01-15T12:21:27Z
dc.date.issued2019
dc.description.abstractOutlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform to the majority of observations. Various techniques and methods for outlier detection can be found in the literature dealing with different types of data. However, many data sets are inflated by true zeros and, in addition, some components/variables might be of compositional nature. Important examples of such data sets are the Structural Earnings Survey, the Structural Business Statistics, the European Statistics on Income and Living Conditions, tax data or – as in this contribution – household expenditure data which are used, for example, to estimate the Purchase Power Parity of a country. In this work, robust univariate and multivariate outlier detection methods are compared by a complex simulation study that considers various challenges included in data sets, namely structural (true) zeros, missing values, and compositional variables. These circumstances make it difficult or impossible to flag true outliers and influential observations by well-known outlier detection methods. Our aim is to assess the performance of outlier detection methods in terms of their effectiveness to identify outliers when applied to challenging data sets such as the household expenditures data surveyed all over the world. Moreover, different methods are evaluated through a close-to-reality simulation study. Differences in performance of univariate and multivariate robust techniques for outlier detection and their shortcomings are reported. We found that robust multivariate methods outperform robust univariate methods. The best performing methods in finding the outliers and in providing a low false discovery rate were found to be the generalized S estimators (GSE), the BACON-EEM algorithm and a compositional method (CoDa-Cov). In addition, these methods performed also best when the outliers are imputed based on the corresponding outlier detection method and indicators are estimated from the data sets.
dc.identifier.doi10.1080/02664763.2019.1671961
dc.identifier.issn1360-0532
dc.identifier.issn0266-4763
dc.identifier.urihttps://irf.fhnw.ch/handle/11654/48342
dc.identifier.urihttps://doi.org/10.26041/fhnw-11057
dc.issue7
dc.language.isoen
dc.publisherTaylor & Francis
dc.relation.ispartofJournal of Applied Statistics
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.spatialLondon
dc.subject.ddc330 - Wirtschaft
dc.subject.ddc510 - Mathematik
dc.titleEvaluation of robust outlier detection methods for zero-inflated complex data
dc.type01A - Beitrag in wissenschaftlicher Zeitschrift
dc.volume47
dspace.entity.typePublication
fhnw.InventedHereNo
fhnw.ReviewTypeAnonymous ex ante peer review of a complete publication
fhnw.affiliation.hochschuleHochschule für Wirtschaft FHNWde_CH
fhnw.affiliation.institutInstitut für Unternehmensführungde_CH
fhnw.openAccessCategoryGold
fhnw.pagination1144-1167
fhnw.publicationStatePublished
relation.isAuthorOfPublication8b0a85e1-60d7-48f9-8551-419197a127e7
relation.isAuthorOfPublication.latestForDiscovery8b0a85e1-60d7-48f9-8551-419197a127e7
Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild
Name:
Evaluation of robust outlier detection methods for zero-inflated complex data.pdf
Größe:
2.5 MB
Format:
Adobe Portable Document Format

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Kein Vorschaubild vorhanden
Name:
license.txt
Größe:
2.66 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: