A platform for research: civil engineering, architecture and urbanism
Structuring Semi-structured Data from Building Inspection Reports Using a Large Language Model
Assessing the status of buildings is the basis for risk evaluation and prediction of maintenance need in the building stock as well as in individual buildings. It can be a both time-consuming and costly task that would benefit from computerized procedures based on machine learning methods. One main obstacle is to find structured data to use for the machine learning. Building inspection report are a goldmine for assessing the status of buildings, both on a building stock level and for individual buildings. As they contain both an overview of the buildings base data; age, type, construction and material choices, installations, etc. and notes on damages, deficiencies and risks and recommended measures often described in free-text sections. The problem is that the structure of the documents is so fluid and varies in structure and format between different inspectors. The free-text sections would, just a couple of years ago, have been very resource-intensive to analyze in any systematic way, but with the emergence of publicly available large language models such as ChatGPT, this is now completely realistic. In this project, we use the large language model in ChatGPT to extract structured data from text in pdf-format, through HTTP requests to ChatGPT in json format. The result is structured data in pre-determined categories that can be used for status prediction both on a building stock level and on individual buildings and as input data to more advanced machine learning procedures. The paper exemplifies the use of the structured data with a focus on prediction of maintenance need for single family houses in western Sweden.
Structuring Semi-structured Data from Building Inspection Reports Using a Large Language Model
Assessing the status of buildings is the basis for risk evaluation and prediction of maintenance need in the building stock as well as in individual buildings. It can be a both time-consuming and costly task that would benefit from computerized procedures based on machine learning methods. One main obstacle is to find structured data to use for the machine learning. Building inspection report are a goldmine for assessing the status of buildings, both on a building stock level and for individual buildings. As they contain both an overview of the buildings base data; age, type, construction and material choices, installations, etc. and notes on damages, deficiencies and risks and recommended measures often described in free-text sections. The problem is that the structure of the documents is so fluid and varies in structure and format between different inspectors. The free-text sections would, just a couple of years ago, have been very resource-intensive to analyze in any systematic way, but with the emergence of publicly available large language models such as ChatGPT, this is now completely realistic. In this project, we use the large language model in ChatGPT to extract structured data from text in pdf-format, through HTTP requests to ChatGPT in json format. The result is structured data in pre-determined categories that can be used for status prediction both on a building stock level and on individual buildings and as input data to more advanced machine learning procedures. The paper exemplifies the use of the structured data with a focus on prediction of maintenance need for single family houses in western Sweden.
Structuring Semi-structured Data from Building Inspection Reports Using a Large Language Model
Lecture Notes in Civil Engineering
Berardi, Umberto (editor) / Svennberg, Kaisa (author) / Ekman, Jan (author)
International Association of Building Physics ; 2024 ; Toronto, ON, Canada
2024-12-06
6 pages
Article/Chapter (Book)
Electronic Resource
English
Hytime : hypermedia/time-based structuring language
TIBKAT | 1994
|Structuring information on residential building: a model of preference
Emerald Group Publishing | 2000
|Reaction-mediated structuring of three-dimensional honeycomb-structured graphene scaffold
British Library Online Contents | 2015
|British Library Conference Proceedings | 2019
|