《Characterization of citizens using word2vec and latent topic analysis in a large set of tweets》

打印
作者
Vladimir Vargas-Calderón;Jorge E. Camargo
来源
CITIES,Vol.92,Issue1,Pages 187-196
语言
英文
关键字
Natural language processing;Word embedding;T-sne;Social network analysis
作者单位
Physics Department, Universidad Nacional de Colombia, Bogotá, Colombia;Systems Engineering Department, Fundación Universitaria Konrad Lorenz, Bogotá, Colombia;Physics Department, Universidad Nacional de Colombia, Bogotá, Colombia;Systems Engineering Department, Fundación Universitaria Konrad Lorenz, Bogotá, Colombia
摘要
With the increasing use of the Internet and mobile devices, social networks are becoming the most used media to communicate citizens' ideas and thoughts. This information is very useful to identify communities with common ideas based on what they publish in the network. This paper presents a method to automatically detect city communities based on machine learning techniques applied to a set of tweets from Bogotá’s citizens. An analysis was performed in a collection of 2,634,176 tweets gathered from Twitter in a period of six months. Results show that the proposed method is an interesting tool to characterize a city population based on a machine learning methods and text analytics.