A platform for research: civil engineering, architecture and urbanism
Automatic Classification of Project Documents on the Basis of Text Content
AbstractOrganizing construction project documents based on semantic similarities offers several advantages over traditional metadata criteria, including facilitating document retrieval and enhancing knowledge reuse. In this study, the use of text classifiers for automatically classifying documents according to their corresponding group of semantically related documents is evaluated. Supporting documents of claims were used as representations of document discourses. The evaluation was performed under varying general conditions (such as dimensionality level and weighting method) to assess the effect of such conditions on performance, and varying classifier-specific parameters. The highest performance in terms of classification accuracy was achieved by a Rocchio classifier and a kNN classifier with the application of dimensionality reduction and using the tf-idf weighting method. A combined classifier approach was also evaluated in which the classification outcome is based on a majority vote strategy between the outcomes of three different classifiers. The evaluation demonstrated that classification accuracy of standard text classifiers can be refined by applying an appropriate level of dimensionality reduction to the training and testing sets and by combining the results of several classifiers. Accordingly, such application enables effective utilization of standard text classifiers for automatic organization of project documents based on text content.
Automatic Classification of Project Documents on the Basis of Text Content
AbstractOrganizing construction project documents based on semantic similarities offers several advantages over traditional metadata criteria, including facilitating document retrieval and enhancing knowledge reuse. In this study, the use of text classifiers for automatically classifying documents according to their corresponding group of semantically related documents is evaluated. Supporting documents of claims were used as representations of document discourses. The evaluation was performed under varying general conditions (such as dimensionality level and weighting method) to assess the effect of such conditions on performance, and varying classifier-specific parameters. The highest performance in terms of classification accuracy was achieved by a Rocchio classifier and a kNN classifier with the application of dimensionality reduction and using the tf-idf weighting method. A combined classifier approach was also evaluated in which the classification outcome is based on a majority vote strategy between the outcomes of three different classifiers. The evaluation demonstrated that classification accuracy of standard text classifiers can be refined by applying an appropriate level of dimensionality reduction to the training and testing sets and by combining the results of several classifiers. Accordingly, such application enables effective utilization of standard text classifiers for automatic organization of project documents based on text content.
Automatic Classification of Project Documents on the Basis of Text Content
Kandil, Amr (author) / Al Qady, Mohammed
2015
Article (Journal)
English
BKL:
56.03
/
56.03
Methoden im Bauingenieurwesen
Local classification TIB:
770/3130/6500
Automatic Classification of Project Documents on the Basis of Text Content
British Library Online Contents | 2015
|Automated Classification of Construction Project Documents
Online Contents | 2002
|Automated Classification of Construction Project Documents
British Library Online Contents | 2002
|Ontology-Based Multilabel Text Classification of Construction Regulatory Documents
Online Contents | 2016
|