A platform for research: civil engineering, architecture and urbanism
Comparing Natural Language Processing Methods to Cluster Construction Schedules
The names of construction activities are the only unstructured data attribute in construction schedules, and they often guide construction execution. Activity names are devised to communicate between stakeholders, and therefore often are written using inconsistent terminologies across repetitive activities with omitted contextual information. This presents a challenge for machine learning systems when learning patterns from construction schedules. This paper compared the performance of state-of-the-art text-related clustering methods in identifying repetitive activities. This was achieved by creating a ground truth data set on the basis of the standard construction work classification, and then comparing the precision, recall, and score of latent semantic analysis (LSA), latent Dirichlet allocation (LDA), word2vec, and fastText algorithms to group activity names in 27 construction schedules. Results indicated that the score of LSA outperformed LDA (0.84% versus 0.88%), whereas the results of language models–based clustering depended on the quality of word embedding and the paired clustering method. This study provides insight into how to preprocess activity names of construction schedules for further artificial intelligence (AI)-based quantitative analysis. Methodologies described in this study will help researchers who work on natural language–related research in construction (e.g., safety and contract management) to better capture the feature of words, rather than only counting the word frequencies.
Comparing Natural Language Processing Methods to Cluster Construction Schedules
The names of construction activities are the only unstructured data attribute in construction schedules, and they often guide construction execution. Activity names are devised to communicate between stakeholders, and therefore often are written using inconsistent terminologies across repetitive activities with omitted contextual information. This presents a challenge for machine learning systems when learning patterns from construction schedules. This paper compared the performance of state-of-the-art text-related clustering methods in identifying repetitive activities. This was achieved by creating a ground truth data set on the basis of the standard construction work classification, and then comparing the precision, recall, and score of latent semantic analysis (LSA), latent Dirichlet allocation (LDA), word2vec, and fastText algorithms to group activity names in 27 construction schedules. Results indicated that the score of LSA outperformed LDA (0.84% versus 0.88%), whereas the results of language models–based clustering depended on the quality of word embedding and the paired clustering method. This study provides insight into how to preprocess activity names of construction schedules for further artificial intelligence (AI)-based quantitative analysis. Methodologies described in this study will help researchers who work on natural language–related research in construction (e.g., safety and contract management) to better capture the feature of words, rather than only counting the word frequencies.
Comparing Natural Language Processing Methods to Cluster Construction Schedules
Hong, Ying (author) / Xie, Haiyan (author) / Bhumbra, Gary (author) / Brilakis, Ioannis (author)
2021-08-14
Article (Journal)
Electronic Resource
Unknown
Comparative visualization of construction schedules
British Library Online Contents | 2013
|Comparative visualization of construction schedules
Elsevier | 2012
|Comparative visualization of construction schedules
Online Contents | 2013
|Risk Assessment in Construction Schedules
British Library Online Contents | 1999
|