Eine Plattform für die Wissenschaft: Bauingenieurwesen, Architektur und Urbanistik
Are BERT embeddings able to infer travel patterns from Twitter efficiently using a unigram approach?
Public opinion is nowadays a valuable data source for many sectors. In this study, we analysed the transportation sector using messages extracted from Twitter. Contrasting with the traditional surveying methods that are high-cost and inefficient used in transportation sector, social media are popular sources of crowdsensing. This work used BERT embeddings, an unsupervised pre-trained model released in 2018, to classify travel-related terms using tweets collected from three distinct cities: New York, London, and Melbourne. In order to understand if a simple model can have a good performance, we used unigrams. A list of 24 travel-related words was used to classify the messages. Popular words are train, walk, car, station, street, and avenue. Between 3% to 5% of all messages are classified as traffic-related, while along the typical working hours of the day the values is around 5-6%. A high model performance was obtained, with precision and accuracy higher than 0.80 and 0.90, respectively. The results are consistent for all the three cities assessed.
Are BERT embeddings able to infer travel patterns from Twitter efficiently using a unigram approach?
Public opinion is nowadays a valuable data source for many sectors. In this study, we analysed the transportation sector using messages extracted from Twitter. Contrasting with the traditional surveying methods that are high-cost and inefficient used in transportation sector, social media are popular sources of crowdsensing. This work used BERT embeddings, an unsupervised pre-trained model released in 2018, to classify travel-related terms using tweets collected from three distinct cities: New York, London, and Melbourne. In order to understand if a simple model can have a good performance, we used unigrams. A list of 24 travel-related words was used to classify the messages. Popular words are train, walk, car, station, street, and avenue. Between 3% to 5% of all messages are classified as traffic-related, while along the typical working hours of the day the values is around 5-6%. A high model performance was obtained, with precision and accuracy higher than 0.80 and 0.90, respectively. The results are consistent for all the three cities assessed.
Are BERT embeddings able to infer travel patterns from Twitter efficiently using a unigram approach?
Murcos, Francisco (Autor:in) / Fontes, Tania (Autor:in) / Rossetti, Rosaldo J. F. (Autor:in)
07.09.2021
825797 byte
Aufsatz (Konferenz)
Elektronische Ressource
Englisch
Analisis Perbandingan Model Bert Dan Xlnet Untuk Klasifikasi Tweet Bully Pada Twitter
DOAJ | 2024
|A novel approach to infer streamflow signals for ungauged basins
British Library Online Contents | 2010
|Recurrence of Daily Travel Patterns: Stochastic Process Approach to Multiday Travel Behavior
British Library Online Contents | 2007
|