Large scale geospatial data conflation: A feature matching framework based on optimization and divide-and-conquer: Fid-Bau Portal

Large scale geospatial data conflation: A feature matching framework based on optimization and divide-and-conquer

Lei, Ting L.

Abstract Geospatial data conflation is the process of combining two datasets to create a better one. It has received increased research attention due to the emergence of new data sources and the need to combine information from these sources in spatial analyses. Many conflation methods exist to date, ranging from simple ones based on spatial join, to sophisticated methods based on statistics and optimization models. This paper focuses on the optimization-based conflation approach. It treats feature-matching in conflation as an optimization problem of finding a plan to match features in two datasets that minimizes the total discrepancy. Optimization based conflation methods may overcome some limitations of conventional methods, such as sub-optimality and greediness. However, they have often been deemed impractical in day-to-day analysis because they induce high computational costs (especially in combining large geospatial data). In this paper, we demonstrate the feasibility of performing optimization-based conflation for large geographic data in Geographic Information Systems. This is accomplished by utilizing efficient network flow-based conflation models and a divide-and-conquer strategy that allows the conflation models to scale to large data. Experiments show that the network-flow based model achieves average recall and precision rates of 97.7% and 90.8%, respectively in small test areas, and outperforms the traditional assignment problem by about 9% each. For larger data, it took the original network-flow model (without divide-and-conquer) nearly two days to conflate the road network in a portion of Los Angeles area near the LAX international airport. By contrast, the same model can be used to conflate the road networks of the entire Los Angeles County, CA in under 3 h with the divide and conquer strategy.

Highlights • Utilized optimization models for geospatial conflation, which overcome sub-optimality in conventional methods. • Compared the new network-flow based conflation model with traditional models based on the assignment problem. • Proposed and tested divide-and-conquer strategies for geospatial data conflation. • Discussed and tested the impact of buffering data tiles on divide-and-conquer. • Developed an “equalized” tiling method based on quad-tree, which produces more balanced workload for data tiles.