Machine learning or discrete choice models for car ownership demand estimation and prediction?: Fid-Bau Portal

Machine learning or discrete choice models for car ownership demand estimation and prediction?

Paredes, Miguel / Hemberg, Erik / O'Reilly, Una-May / Zegras, Chris

Discrete choice models are widely used to explain transportation behaviors, including a household's decision to own a car. They show how some distinct choice of human behavior or preference influences a decision. They are also used to project future demand estimates to support policy exploration. This latter use for prediction is indirectly aligned with and conditional to the model's estimation which aims to fit the observed data. In contrast, machine learning models are derived to maximize prediction accuracy through mechanisms such as out-of-sample validation, non-linear structure, and automated covariate selection, albeit at the expense of interpretability and sound behavioral theory. We investigate how machine learning models can outperform discrete choice models for prediction of car ownership using transportation household survey data from Singapore. We compare our household car ownership model (multinomial logit model) against various machine learning models (e.g. Random Forest, Support Vector Machines) by using 2008 data to derive, i.e. estimate models that we then use to predict 2012 ownership. The machine learning models are inferior to the discrete choice model when using discrete choice features. However, after engineering features more appropriate for machine learning they are superior. These results highlight both the cost of applying machine learning models in econometric contexts and an opportunity for improved prediction and better urban policy making through machine learning models with appropriate features.