A platform for research: civil engineering, architecture and urbanism
Principal component analysis for compositional data with outliers
10.1002/env.966.abs
Compositional data (almost all data in geochemistry) are closed data, that is they usually sum up to a constant (e.g. weight percent, wt.%) and carry only relative information. Thus, the covariance structure of compositional data is strongly biased and results of many multivariate techniques become doubtful without a proper transformation of the data. The centred logratio transformation (clr) is often used to open closed data. However the transformed data do not have full rank following a logratio transformation and cannot be used for robust multivariate techniques like principal component analysis (PCA). Here we propose to use the isometric logratio transformation (ilr) instead. However, the ilr transformation has the disadvantage that the resulting new variables are no longer directly interpretable in terms of the originally entered variables. Here we propose a technique how the resulting scores and loadings of a robust PCA on ilr transformed data can be back‐transformed and interpreted. The procedure is demonstrated using a real data set from regional geochemistry and compared to results from non‐transformed and non‐robust versions of PCA. It turns out that the procedure using ilr‐transformed data and robust PCA delivers superior results to all other approaches. The examples demonstrate that due to the compositional nature of geochemical data PCA should not be carried out without an appropriate transformation. Furthermore a robust approach is preferable if the dataset contains outliers. Copyright © 2009 John Wiley & Sons, Ltd.
Principal component analysis for compositional data with outliers
10.1002/env.966.abs
Compositional data (almost all data in geochemistry) are closed data, that is they usually sum up to a constant (e.g. weight percent, wt.%) and carry only relative information. Thus, the covariance structure of compositional data is strongly biased and results of many multivariate techniques become doubtful without a proper transformation of the data. The centred logratio transformation (clr) is often used to open closed data. However the transformed data do not have full rank following a logratio transformation and cannot be used for robust multivariate techniques like principal component analysis (PCA). Here we propose to use the isometric logratio transformation (ilr) instead. However, the ilr transformation has the disadvantage that the resulting new variables are no longer directly interpretable in terms of the originally entered variables. Here we propose a technique how the resulting scores and loadings of a robust PCA on ilr transformed data can be back‐transformed and interpreted. The procedure is demonstrated using a real data set from regional geochemistry and compared to results from non‐transformed and non‐robust versions of PCA. It turns out that the procedure using ilr‐transformed data and robust PCA delivers superior results to all other approaches. The examples demonstrate that due to the compositional nature of geochemical data PCA should not be carried out without an appropriate transformation. Furthermore a robust approach is preferable if the dataset contains outliers. Copyright © 2009 John Wiley & Sons, Ltd.
Principal component analysis for compositional data with outliers
Filzmoser, Peter (author) / Hron, Karel (author) / Reimann, Clemens (author)
Environmetrics ; 20 ; 621-632
2009-09-01
12 pages
Article (Journal)
Electronic Resource
English
Principal component analysis for compositional data with outliers
Online Contents | 2009
|Blind denoising of structural vibration responses with outliers via principal component pursuit
Wiley | 2014
|