A platform for research: civil engineering, architecture and urbanism
Performance Evaluation of Pipe Break Machine Learning Models Using Datasets from Multiple Utilities
Water pipeline infrastructures are critical for the delivery of lifeline services; however, these aging systems are experiencing increasing breakage rates. To assist utilities in identifying the most vulnerable assets, sustained research efforts have been made in developing machine learning models to accurately predict future failures. The performance of these methods heavily depends on the quantity of reliable data, while most utilities only have limited records of historical pipe breaks. To overcome the limitation of data availability, this article presents a case study exploring the performance of machine learning methods for predicting future failures when system information from multiple utilities is combined. Six utilities are considered, for which predictive models are trained and evaluated in several scenarios, (1) using data from only a single reference system, (2) all systems combined, and (3) a bootstrapped sample of multiple systems to match the pipe material distribution of the reference system. Empirical results suggest that variance controlling algorithms, such as random forests, are less sensitive to the availability of data, and that introducing information from third-party sources only leads to marginal changes in performance. Overall, the amount of break records from the reference system itself has the largest influence on accuracy, suggesting that utilities must keep reliable historical break data to maximize the power of predictive modeling for their asset management programs.
Performance Evaluation of Pipe Break Machine Learning Models Using Datasets from Multiple Utilities
Water pipeline infrastructures are critical for the delivery of lifeline services; however, these aging systems are experiencing increasing breakage rates. To assist utilities in identifying the most vulnerable assets, sustained research efforts have been made in developing machine learning models to accurately predict future failures. The performance of these methods heavily depends on the quantity of reliable data, while most utilities only have limited records of historical pipe breaks. To overcome the limitation of data availability, this article presents a case study exploring the performance of machine learning methods for predicting future failures when system information from multiple utilities is combined. Six utilities are considered, for which predictive models are trained and evaluated in several scenarios, (1) using data from only a single reference system, (2) all systems combined, and (3) a bootstrapped sample of multiple systems to match the pipe material distribution of the reference system. Empirical results suggest that variance controlling algorithms, such as random forests, are less sensitive to the availability of data, and that introducing information from third-party sources only leads to marginal changes in performance. Overall, the amount of break records from the reference system itself has the largest influence on accuracy, suggesting that utilities must keep reliable historical break data to maximize the power of predictive modeling for their asset management programs.
Performance Evaluation of Pipe Break Machine Learning Models Using Datasets from Multiple Utilities
J. Infrastruct. Syst.
Chen, Thomas Ying-Jeh (author) / Vladeanu, Greta (author) / Yazdekhasti, Sepideh (author) / Daly, Craig Michael (author)
2022-06-01
Article (Journal)
Electronic Resource
English
The Pipe Break of the Winter of 2003: Middlesex County Utilities Authority Pipeline Failure
British Library Conference Proceedings | 2004
|Forecasting of Power Demand for Distribution Utilities Using Machine Learning Models
British Library Conference Proceedings | 2022
|WATER UTILITIES' USE OF PLASTIC PIPE
Wiley | 1971
|