Robust Multivariate Methods for Income Data.
03 - Sammelband
Primary target group
Created while belonging to FHNW?
With the EU Statistics on Income and Living Conditions (EU-SILC), the European Union established a coordinated survey and adopted a set of indicators (Laeken indicators) to monitor poverty and social cohesion. In particular, the monetary Laeken indicators are based on the equivalized disposable income per person, an aggregation and redistribution of person- and household-specific income components (e.g., income from employment and capital; unemployment-, old-age-, survivors'-, and disability benefits, etc.). To understand this highly complex data the components that are exclusively measured at household-level are distributed among the household members while the individual components are investigated before they are aggregated and redistributed to all household members. The personal income components show the following characteristics: the marginal distribution of each component is heavily skewed and has a remarkable point mass at zero, the joint distribution of the components is far from being elliptically contoured (even after appropriate transformation), an overwhelming majority of observations lies on subspaces i.e., exhibits structural zeros on certain dimension (e.g., individuals on working age with a positive employee-cash income do neither receive old-age nor unemployment benefits, and vice versa), within subspaces the observations are clustered with respect to non-monetary, socio-economic characteristics, many components have missing values, and finally there are outliers in many components but in addition there are genuinely multivariate outliers. The influence of outliers and outlier treatments on the components and on the equivalized disposable income and the Laeken indicators are investigated. In particular the outliers may have a considerable effect on the the Laeken indicators. The presentation shows the development of outlier detection and imputation methods which are capable to treat the structural zeros appropriately, which work with missing values, which cope with the complex nature of the data, which take the sampling design into account, and which are still computationally feasible.