A platform for research: civil engineering, architecture and urbanism
Streamflow prediction in ungauged basins located within data-scarce areas using XGBoost: role of feature engineering and explainability
Streamflow prediction in ungauged basins particularly within data-scarce areas is a challenging and sensitive task. Traditionally, conceptual and physical models have been utilized to deal with this task. While there have been many studies based on machine learning models and in particular deep learning techniques, with recent advances in machine learning, it is imperative that the hydrologic community take further advantage of data-driven machine learning techniques to address the challenge of streamflow prediction in ungauged basins. Perhaps difficulty of incorporating expert physical/hydrological knowledge in the modelling process and lack of sufficient explainability for machine learning models are some of the obstacles in wider utilization of machine learning models for streamflow prediction. This paper uses XGBoost for streamflow prediction in ungauged basins located within data-scarce regions by incorporating physical and hydrological knowledge in the modelling process through feature engineering. The explainability of the models is studied using SHAP. Accordingly, three XGBoost models are evaluated based on different levels of feature engineering and a fourth model is evaluated by adding a physical constraint to the third model. The four models are applied to six target catchments located in four different countries/continents with diverse hydro-climatic conditions. The performance of the models is compared against previous studies including against the SPED framework which is based on a conceptual hydrological model and available data/knowledge about the reference and target catchments. The second XGBoost model proves to be the most plausible model which outperforms the previous studies or does comparably in five of the target catchments (Nash-Sutcliffe Efficiency range of 0.61–0.81, where 1 indicates a perfect match between observations and predictions). However, in North Fork Cache Creek in the United States where the target catchment is quite different from the reference catchment in terms of magnitude of low flows, this model fails to provide satisfactory streamflow predictions.
Streamflow prediction in ungauged basins located within data-scarce areas using XGBoost: role of feature engineering and explainability
Streamflow prediction in ungauged basins particularly within data-scarce areas is a challenging and sensitive task. Traditionally, conceptual and physical models have been utilized to deal with this task. While there have been many studies based on machine learning models and in particular deep learning techniques, with recent advances in machine learning, it is imperative that the hydrologic community take further advantage of data-driven machine learning techniques to address the challenge of streamflow prediction in ungauged basins. Perhaps difficulty of incorporating expert physical/hydrological knowledge in the modelling process and lack of sufficient explainability for machine learning models are some of the obstacles in wider utilization of machine learning models for streamflow prediction. This paper uses XGBoost for streamflow prediction in ungauged basins located within data-scarce regions by incorporating physical and hydrological knowledge in the modelling process through feature engineering. The explainability of the models is studied using SHAP. Accordingly, three XGBoost models are evaluated based on different levels of feature engineering and a fourth model is evaluated by adding a physical constraint to the third model. The four models are applied to six target catchments located in four different countries/continents with diverse hydro-climatic conditions. The performance of the models is compared against previous studies including against the SPED framework which is based on a conceptual hydrological model and available data/knowledge about the reference and target catchments. The second XGBoost model proves to be the most plausible model which outperforms the previous studies or does comparably in five of the target catchments (Nash-Sutcliffe Efficiency range of 0.61–0.81, where 1 indicates a perfect match between observations and predictions). However, in North Fork Cache Creek in the United States where the target catchment is quite different from the reference catchment in terms of magnitude of low flows, this model fails to provide satisfactory streamflow predictions.
Streamflow prediction in ungauged basins located within data-scarce areas using XGBoost: role of feature engineering and explainability
Alipour, M. H. (author)
International Journal of River Basin Management ; 23 ; 71-92
2025-01-02
22 pages
Article (Journal)
Electronic Resource
English
Streamflow Prediction in Ungauged Basins: Review of Regionalization Methods
British Library Online Contents | 2013
|Streamflow Prediction in Ungauged Basins: Review of Regionalization Methods
Online Contents | 2013
|A novel approach to infer streamflow signals for ungauged basins
British Library Online Contents | 2010
|