A platform for research: civil engineering, architecture and urbanism
A matrix completion approach for imputing missing pavement condition data and its impact on pavement performance prediction
High-quality pavement condition data are crucial for effective pavement management. However, issues like missing or erroneous data are common, and existing studies often inadequately document their data cleaning processes. This study introduces a pavement condition data imputation method using the softimpute matrix completion algorithm, comparing it to linear interpolation across varying missing data ratios. The impact of anomalous data handling strategies on pavement performance prediction across various data availability levels was investigated. Results show that as the missing data ratio increases, both imputation methods experience reduced accuracy, though the error for linear interpolation rises more sharply than softimpute. Softimpute consistently outperforms linear interpolation in imputation accuracy across all missing data ratios but may introduce subtle distributional biases at missing rates above 50%. For datasets smaller than 100, softimpute is recommended while direct deletion is less advantageous than using original data. For larger datasets (>150,000 for neural networks and >10,000 for tree-based models), direct deletion yields optimal prediction performance, making imputation unnecessary. For medium-sized datasets, imputation is preferred, though the performance gap between softimpute and direct deletion narrows as data volume grows. This study is expected to guide practitioners in selecting effective anomalous data handling strategies for improved pavement management.
A matrix completion approach for imputing missing pavement condition data and its impact on pavement performance prediction
High-quality pavement condition data are crucial for effective pavement management. However, issues like missing or erroneous data are common, and existing studies often inadequately document their data cleaning processes. This study introduces a pavement condition data imputation method using the softimpute matrix completion algorithm, comparing it to linear interpolation across varying missing data ratios. The impact of anomalous data handling strategies on pavement performance prediction across various data availability levels was investigated. Results show that as the missing data ratio increases, both imputation methods experience reduced accuracy, though the error for linear interpolation rises more sharply than softimpute. Softimpute consistently outperforms linear interpolation in imputation accuracy across all missing data ratios but may introduce subtle distributional biases at missing rates above 50%. For datasets smaller than 100, softimpute is recommended while direct deletion is less advantageous than using original data. For larger datasets (>150,000 for neural networks and >10,000 for tree-based models), direct deletion yields optimal prediction performance, making imputation unnecessary. For medium-sized datasets, imputation is preferred, though the performance gap between softimpute and direct deletion narrows as data volume grows. This study is expected to guide practitioners in selecting effective anomalous data handling strategies for improved pavement management.
A matrix completion approach for imputing missing pavement condition data and its impact on pavement performance prediction
Yao, Linyi (author) / Leng, Zhen (author) / Ni, Fujian (author)
2024-12-31
Article (Journal)
Electronic Resource
English
A Spatial‐Bayesian Technique for Imputing Pavement Network Repair Data
Online Contents | 2012
|Pavement Condition Prediction Using Clusterwise Regression
British Library Conference Proceedings | 2006
|Pavement Condition Prediction Using Clusterwise Regression
British Library Online Contents | 2006
|