Evaluation of robust outlier detection methods for zero-inflated complex data

Templ, Matthias; Gussenbauer, Johannes; Filzmoser, Peter

Evaluation of robust outlier detection methods for zero-inflated complex data

Dateien

Evaluation of robust outlier detection methods for zero-inflated complex data.pdf(2.5 MB)

Autor:innen

Templ, Matthias

Gussenbauer, Johannes

Filzmoser, Peter

Autor:in (Körperschaft)

Publikationsdatum

2019

Typ der Arbeit

Studiengang

Sammlung

Institut für Unternehmensführung

Komplettanzeige

Typ

01A - Beitrag in wissenschaftlicher Zeitschrift

Herausgeber:innen

Herausgeber:in (Körperschaft)

Betreuer:in

Übergeordnetes Werk

Journal of Applied Statistics

Themenheft

DOI der Originalpublikation

https://doi.org/10.1080/02664763.2019.1671961

URI

https://irf.fhnw.ch/handle/11654/48342
https://doi.org/10.26041/fhnw-11057

Link

Reihe / Serie

Reihennummer

Jahrgang / Band

47

Ausgabe / Nummer

7

Seiten / Dauer

1144-1167

Patentnummer

Verlag / Herausgebende Institution

Taylor & Francis

Verlagsort / Veranstaltungsort

London

Auflage

Version

Programmiersprache

Abtretungsempfänger:in

Praxispartner:in/Auftraggeber:in

Zusammenfassung

Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform to the majority of observations. Various techniques and methods for outlier detection can be found in the literature dealing with different types of data. However, many data sets are inflated by true zeros and, in addition, some components/variables might be of compositional nature. Important examples of such data sets are the Structural Earnings Survey, the Structural Business Statistics, the European Statistics on Income and Living Conditions, tax data or – as in this contribution – household expenditure data which are used, for example, to estimate the Purchase Power Parity of a country. In this work, robust univariate and multivariate outlier detection methods are compared by a complex simulation study that considers various challenges included in data sets, namely structural (true) zeros, missing values, and compositional variables. These circumstances make it difficult or impossible to flag true outliers and influential observations by well-known outlier detection methods. Our aim is to assess the performance of outlier detection methods in terms of their effectiveness to identify outliers when applied to challenging data sets such as the household expenditures data surveyed all over the world. Moreover, different methods are evaluated through a close-to-reality simulation study. Differences in performance of univariate and multivariate robust techniques for outlier detection and their shortcomings are reported. We found that robust multivariate methods outperform robust univariate methods. The best performing methods in finding the outliers and in providing a low false discovery rate were found to be the generalized S estimators (GSE), the BACON-EEM algorithm and a compositional method (CoDa-Cov). In addition, these methods performed also best when the outliers are imputed based on the corresponding outlier detection method and indicators are estimated from the data sets.

Schlagwörter

Fachgebiet (DDC)

330 - Wirtschaft
510 - Mathematik

Projekt

Veranstaltung

Startdatum der Ausstellung

Enddatum der Ausstellung

Startdatum der Konferenz

Enddatum der Konferenz

Datum der letzten Prüfung

ISBN

ISSN

1360-0532
0266-4763

Sprache

Englisch

Während FHNW Zugehörigkeit erstellt

Nein

Zukunftsfelder FHNW

Publikationsstatus

Veröffentlicht

Begutachtung

Peer-Review der ganzen Publikation

Open Access-Status

Gold

Lizenz

Zitation

Templ, M., Gussenbauer, J., & Filzmoser, P. (2019). Evaluation of robust outlier detection methods for zero-inflated complex data. Journal of Applied Statistics, 47(7), 1144–1167. https://doi.org/10.1080/02664763.2019.1671961

Komplettanzeige